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Disclaimer 

The  English  language  has  no  explicitly  neuter  personal  pronoun.  Many 
people  consider  this  an  unfortunate  omission.  However,  traditional  proper 
usage  dictates  that  the  personal  pronoun  "he"  and  its  derivatives  be  used 
when  a  neuter  personal  pronoun  is  required.  I  shall  follow  tradition,  and 
use  "he,"  the  Women's  Liberation  Movement  notwithstanding.  I  do  not  mean 
to  offend  with  my  use  of  "he",  merely  to  express  myself  cleanly  and  easily 


-7- 


Table  of  Contents 

ABSTRACT  .  2 

DEDICATION  .  3 

ACKNOWLEDGMENTS . 4 

DISCLAIMER  .  7 

TABLE  OF  CONTENTS . 8 

TABLE  OF  FIGURES . 10 

I.  INTRODUCTION . U 

A.  The  Problem . 12 

B.  The  Environment . 14 

C.  The  Programming  Language:  Syspal . 16 

D.  The  "Things" . 20 

E.  Related  Work . 22 

F.  Plan  for  the  Remainder  of  this  Presentation . 23 

It.  ANTECEDENTS . 24 

A.  Honeywell's  Multlcs . 25 

B.  Hewlett-Packard's  MPE/3000  .  31 

C.  Miscellaneous . 35 

1.  Unix . 36 

2.  Hydra . 37 

3.  Version  Maintenance  .  38 

D.  Summary . 39 

HI.  DEFINITION  OF  A  "CAT0AN-0B JECT" . 41 

A.  Issues:  Containment,  and  Trust . 42 

1.  Containment  and  Catoan . 43 

2.  Trust  and  Catoan . 45 

B.  The  Basic  Object . 47 

1.  The  Operations  of  the  Basic  Object . 52 

2.  Comments  on  the  "_SET"  Operations . 57 

3.  Naming  and  the  DIRECTORY . 58 

4.  Storing  Data:  The  CONTENTS  .  60 

5.  Protection  and  Security . 62 

C.  A  Refined  Object . . 65 

1.  Protection  and  Security . 66 

2.  Cross-Referencing  .  70 


-8- 


Table  of  Contents 

D.  Versioned  Objects . 73 

1.  Version  Naming . 74 

2.  Storing  and  Implementing  Versions  .  76 

3.  More  on  Version  Naming . 84 

E.  Summary . 86 

IV.  AN  EXAMPLE:  A  SYSPAL  PROGRAM  OBJECT  .  87 

A.  Motivation . 88 

B.  Definition . 90 

C.  Use . 94 

D.  Summary . 96 

V.  IMPLICATIONS  OF  MULTIPLE  NAMING  ENVIRONMENTS  .  97 

A.  Disjoint  Naming  Spaces  .  98 

B.  A  Standard  Interface  for  Filing  Systems . 100 

C.  Garbage  Collection . 103 

D.  Summary . 107 

VI.  SUMMARY,  AND  EVALUATION  OF  THE  PROPOSED  SOLUTION . 108 

A.  Summary . 109 

B.  Completeness . Ill 

C.  Trade-offs . 115 

D.  Remaining  Work . H8 

APPENDIX  A . 120 


REFERENCES 


133 


Table  of  Figures 


Figure  1:  Sample  Representations  of  Multics  Objects  .  29 

Figure  la:  Directory . 29 

Figure  lb:  Segment . . . 30 

Figure  lc:  Link . . 

Figure  2:  Sample  Representation  of  an  MPE/3000  File  .  34 

Figure  3:  The  Basic  Catoan-ObJ ect  .  49 

Figure  4:  Catoan-Obj ect  with  CONTENTS  of  Type  "text" . 52 

Figure  5:  An  Access  Control  List  Scheme  for  Catoan . 69 

Figure  6:  Additions  to  the  Basic  Object  for  Cross-Referencing  ...  72 

Figure  7:  Version  Naming  Hierarchy . 75 

Figure  8:  Additional  Information  and  Operations  for  Version 

Maintenance . 78 

Figure  9:  Definition  of  VERS ION  _GENERATING_PROCEDUREs . 80 

Figure  10:  A  Syspal-Program  Object  .  91 

Figure  11:  Standard,  Minimal  Interface  for  a  Filing  System . 101 

Figure  12:  A  Module  Implementing  a  Stack . 127 


-10- 


CHAPTER  ONE 


INTRODUCTION 


In  this  chapter,  I  describe  the  problem  to  which  this  report  is 
addressed.  The  environment  which  was  assumed  during  my  research  is 
described  (including  the  types  of  computing  systems  at  which  the  results 
presented  here  are  aimed) ,  as  are  the  assumptions  about  that  environment. 
The  programming  language  used  in  the  examples  and  descriptions  in  this 
report  is  also  briefly  described.  A  short  description  of  the  entLtles  In 
the  computing  system  which  are  addressed  here  is  presented.  I  then  discuss 
related  work,  and  present  a  plan  for  the  remainder  of  the  presentation. 
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Introduction 


The  Problem 


I. A.  The  Problem. 

How  does  one  store  and  reference  things  In  a  computing  system? 
Especially,  how  does  one  store  and  reference  things  whose  existence  Is 
longer  than  that  of  the  process  which  created  them?  What  Is  the  structure 
of  these  "things"  which  are  stored?  How  can  they  be  manipulated?  What  are 
the  common  characteristics  of  most  of  the  "things"  In  a  computing  system? 

Is  there  anything  that  can  be  done  to  those  "things"  which  does  not  fit  the 
model  of  "common  characteristics"? 

Oae  of  the  Important  trends  In  current  computer  science  research  13 
data  abstractions:  programming  using  abstract  data  objects,  whose 
representation  Is  not  only  of  no  concern  to  the  user,  but  Is  forcibly 
hidden  from  him. 

When  using  a  computing  system,  one  usually  wants  to  retain  some  data 
for  long  periods  of  time.  This  requires  some  form  of  permanent  storage  on 
the  computing  system,  and  a  mechanism  for  accessing  the  data  stored  in  the 
permanent  storage.  Unfortunately,  many  abstraction  languages  Ignore  the 
Issue  of  permanence,  retaining  objects  only  for  the  life  of  the  process 
which  created  them.  Yet,  users  want  permanent  storage  of  their  objects. 


Introduction 
The  Problem 

Once  an  object  exists  for  longer  than  the  life  of  Its  creating  process, 
it  is  desirable  to  attach  a  human-usable,  hopefully  mnemonic,  name  to  it. 
Such  a  desire  requires  a  managing  program  for  the  names,  and  objects:  to 
translate  names  to  internal  object  references,  to  provide  a  uniform 
semantic  interpretation  for  the  names,  and  to  manage  the  stored  objects. 

Classically,  in  order  to  permanently  store  an  (abstract)  object,  and  to 
attach  a  name  to  it,  the  object  had  to  be  transformed  from  its  Internal 
representation  to  some  external  representation  (like  a  stream  of  bits). 

This  external  representation  was  then  passed  to  a  "file  system,"  which 
stored  the  stream  of  bits  representing  the  abstract  object  in  a  "file." 
Usually,  the  conversions  from  internal  representation  to  external 
representation  was  very  visible  to  the  user.  Such  a  transformation  is 
undesirable,  as  it  negates  some  of  the  benefits  of  data  abstraction 
techniques. 

In  this  thesis  report,  I  shall  address  these,  and  other.  Issues.  I 
shall  describe  the  "things"  stored  in  a  computing  system,  and  how  one  might 
manipulate,  define,  and  characterize  them.  I  shall  compare  and  contrast 
this  work  with  that  of  other  schemes  for  referencing  and  manipulating 
"things."  I  shall  examine  how  the  definition  of  the  "things"  affects  their 
naming  and  other  properties. 


Chapter  One 
Section  A 
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Section  B  The  Environment 

I.B.  The  Environment. 

Described  here  is  a  scheme  aimed  at  a  range  of  environments.  It  will 
work  equally  well  on  single  user  computing  systems  and  on  multiple  user, 
shared  systems.  Often,  on  single  user  systems,  some  of  the  problems  of 
concurrent  accessing  and  of  protection  become  moot  points,  and  so  the  focus 
of  this  report  will  be  on  shared  systems. 

A  virtual  machine  Is  similar  to  both  the  single  and  multiple  user 
systems.  Within  one  process  or  collection  of  processes.  It  appears  to  be 
single  user.  However,  many  virtual  machines  running  on  the  same  real 
machine  often  share  logical,  as  well  as  physical,  resources.  For  example, 
multiple  virtual  machines  may  share  the  same  file  system  for  permanent 
storage,  thereby  sharing  not  only  the  physical  storage  devices  but  also  the 
logical  naming  space.  The  scheme  presented  in  the  following  chapters  will 
also  fill  the  needs  of  a  virtual  machine  environment. 

Loosely  connecting  autonomous  systems  together  to  form  a  network  of 
computers  presents  some  problems  which  I  shall  not  address.  For  example, 
there  are  the  problems  of  naming  resources  on  remote  systems,  locating 
resources  on  remote  systems,  and  network-wide  sharing  and  protection.  It 
Is  hoped,  however,  that  the  general  network  case  is  a  simple  extension  of 
the  work  described  here  for  a  single,  multiple  user  system. 
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The  specific  environment  assumed  In  this  work  Is  a  single,  multi-user 
computing  system,  with  a  large  address  space  (for  example,  at  least  a 
trillion  bits).  Storage  entitles  are  accessed  by  presenting  a  unique 
Identifier  for  the  entity  (such  as  an  address,  a  segment  number,  or  a 
capability)  and  an  address  within  the  entity  to  the  memory  management 
system,  which  Is  responsible  for  the  allocation  of  and  access  to  the  memory 
resource.  Within  each  entity  In  the  system,  references  to  other  entitles 
may  exist,  and  they  may  exist  anywhere  within  the  entity  (rather  than  In 
some  particular  location  within  the  entity). 

The  memory  resource  Is  presumed  to  be  virtual,  though  It  could  be 
entirely  real  memory,  provided  there  Is  a  sufficiently  large  non-volatile 
component.  Permanent  storage  of  an  entity  Is  achieved  by  not  deleting  the 
entity;  future  accessing  must  be  done  with  the  unique  Identifier  used  to 
create  the  entity.  Memory  appears  to  be  single  level;  all  entitles  exist 
In  the  same  collection  of  memory.  In  particular,  the  notion  of  separate 
permanent  and  temporary  memories  Is  foreign  to  my  presumed  environment. 

In  my  assumed  system  environment,  security  Is  a  major  concern.  An 
objective  Is  to  minimize  the  number  of  trusted  components  In  the  system. 

By  "trust"  I  mean  to  give  access  to  one's  data,  when  that  access  Is  not  for 
reason  of  explicit  use.  In  most  existing  systems,  the  filing  system  Is 


-15- 


Chapter  One 
Sectloa  B 


Introduction 
The  Environment 


crusted  —  It  can  delete,  modify,  make  Inaccessible,  or  leak  the  data  In 
any  file  In  the  system.  In  the  proposals  following,  the  filing  system 
(object  manager)  need  not  be  crusted  to  not  modify  or  leak  data.  (It  will 
still  be  able  to  delete  data,  and  to  make  them  inaccessible.)  The  only 
component  of  the  system  that  will  have  to  be  trusted  with  one's  data  is  the 
memory  management  system,  which  deals  with  data  on  a  bit  (or  collection  of 
bits)  level,  and  can  place  data  in  any  address  space  in  the  system.  (If  a 
single- level ,  non-volatile  storage  system  is  used,  the  memory  manager  need 
not  have  the  "power"  it  would  in  a  multiple-level,  volatile  (virtual) 
storage  system.) 

Kn  additional  aspect  of  my  presumed  environment  is  that  the  operating 
system  provided  is  a  kernel,  to  which  some  user-environment  features  have 
been  added.  The  user- environment  features  need  not  be  used  if  one  desires 
to  write  a  replacement  (or  simply  do  without  the  feature).  The  filing 
system  provided  with  the  kernel  is  part  of  the  optional  section  of  the 
system;  therefore,  multiple  filing  systems  could  exist. 

I.C.  The  Programming  Language:  Syspal. 

The  examples  presented  in  the  following  chapters  use  the  "Syspal" 
programming  language.  Syspal  [ l 01  is  a  Pascal-based  systems  programming 
language  being  developed  at  Hewlett-Packard's  Computer  Research  Laboratory. 
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Syspal  Is  an  object-oriented  language,  similar  to  MIT's  CLU  [21,  22]  or 
Carnegie-Mellon' s  Alphard  [34,  35,  37].  One  defines  an  object  by  defining 
the  operations  one  can  perform  on  the  object;  the  actual  realization  of  the 
abstract  object  Is  not  visible  to  Its  users.  Following  Is  a  short  summary 
of  some  of  the  features  of  Syspal  which  are  used  In  this  report;  a  summary 
of  the  relevant  features  of  Syspal  Is  In  Appendix  A. 

Syspal  provides  only  a  very  few  types,  and  allows  the  programmer  to 

extend  those  types.  Specifically,  Syspal  includes  no  "string  for  direct 

use.  Throughout  this  report,  strings  will  have  the  representation 

strlng(slze:  0  TO  100)  -  TYPE  RECORD 
length:  0  TO  size; 
chars:  ARRAY ( l  TO  size)  OF  CHAR; 

END;  Istring 

with  all  the  usual  string  operations  defined. 

The  definition  of  STRING  points  out  several  features  of  Syspal. 

Defined  types  can  take  one  or  more  parameters  which  further  specify  the 
type.  The  string  definition  shown  above  takes  "size"  as  Its  parameter, 
specifying  the  length  of  the  string.  The  statement 

llfe_hlstory:  string(50); 
declares  a  variable  as  a  string  of  length  fifty. 
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There  are  two  kinds  of  comments;  a  "here  to  end  of  line"  comment 
(denoted  by  "l"),  and  a  "here  to  end  of  comment"  comment  (which  uses  "(*" 
to  open  the  comment  and  "*)"  to  close  it). 

Syspal  allows  pointers  to  be  declared.  Pointers  are  typed;  that  Is,  a 
pointer  refers  to  an  object  of  some  particular  type,  rather  than  a  pointer 
to  anything  (PL/l  pointers  are  of  the  latter  flavor),  As  an  example,  the 
following  could  be  the  representation  of  a  list.  Like  strings,  lists  are 
parameter-based:  the  type  of  the  list's  elements  is  supplied  by  the 
"abstraction"  user. 

Ust(element _type:  TYPE)  -  TYPE  RECORD 
first:  @element_type; 
rest:  @list(element_type) ; 

END;  ! list 

The  field  "first"  is  a  pointer  to  an  object  of  type  "element _type";  "rest" 
is  a  pointer  to  a  list  of  type  "eleraent_type." 

As  a  further  example,  shown  in  Figure  12  in  Appendix  A  is  a  definition 
of  a  STACK  abstraction  which  takes,  as  its  parameters,  the  type  of  the 
objects  on  the  stack,  and  the  number  of  elements  the  stack  will  be  able  to 
contain.  The  definition  takes  the  form  of  a  "module,"  the  Syspal 
equivalent  of  the  CLU  "cluster."  The  operations  on  stacks,  a 
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representation  of  a  stack,  and  various  "Interfaces"  for,  or  "means  of 
referencing,"  stacks  are  shown. 


Within  a  module,  the  keyword  SELF  is  bound  to  the  object  on  which  the 
operation  was  called.  SELF  is  not  included  in  the  header  of  the  function, 
but  is  supplied  as  the  first  argument  to  the  operation  when  it  Is  called. 
The  name  of  the  module  need  not  be  provided  In  the  CALL  statement;  It  is 
recognized  from  the  type  of  the  first  argument.  For  example,  with  the 
declarations 

envr:  stack(algol_stack_frame) ; 
x:  algol _stack_frame; 
algol_stack_frame  »  TYPE  .  .  .  ; 

CLU  would  require  a  CALL  similar  to 

CALL  stack$push(envr,  x) ; 
whereas  in  Syspal,  the  same  statement  would  be 

CALL  push(envr,  x) ; 


or,  optionally. 


CALL  stack. push(envr,  x) ; 


if  the  fully-qualified  operation  name  was  desired.  Within  the  module 
implementing  stacks,  "SELF"  would  refer  to  "envr"  in  the  above  example. 
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I.D.  The  "Things." 

At  the  beginning  of  this  chapter,  I  referred  to  the  "things,"  the 
entitles,  stored  In  a  computing  system.  What  are  those  "things"?  What  are 
their  properties,  what  Is  their  structure,  what  operations  can  be  performed 
on  them? 

The  "things"  to  which  I  refer  are  the  abstract  data  objects  which  are 
stored  In  the  computing  system's  long-term  ("permanent")  storage.  Such 
objects  may  be  viewed  as  files,  segments,  programs,  hierarchical  or 
relational  databases  —  whatever  one  might  want  to  retain  for  long  periods 
of  time.  The  various  kinds  of  objects  are  defined  by  the  operations  which 
can  be  performed  on  them.  In  addition  to  those  which  can  be  performed  on 
ALL  objects.  Most  existing  permanent-storage  systems  do  not  take  this 
view,  but,  rather,  view  storage  as  a  collection  or  stream  of  bits  or  bytes, 
or  possibly  as  an  array  of  "records."  Indeed,  some  of  the  reports  on 
current  research  on  storage  systems  take  a  byte-stream  view  of  storage, 
when  such  a  view  is  not  necessary  (see,  for  example,  [20]). 

The  view  of  objects  as  abstractions  Is  similar  to  that  which  CLU, 
ALPHARD,  Smalltalk  [12],  and  Syspal  take  of  data.  An  object  Is  an  abstract 
data  type,  out  of  which  other  abstract  data  types  are  made.  An  example  of 
this  Is  building  a  flrst-ln,  flrst-out  queue  from  a  linked  list.  The 
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programmer  Implementing  the  queue  Is  not  concerned  with  the  Implementation 
of  the  list  abstraction,  merely  with  the  definition  of  the  operations  of 
the  list  (FIRST,  REST,  APPEND).  If  the  Input-output  specifications  of  the 
operations  on  lists  remain  the  same,  changing  the  Implementation  of  lists 
does  not  matter.  Perhaps  the  person  maintaining  lists  may  decide  that 
lists  larger  than  some  critical  size  should  be  stored  using  a  different 
format;  the  user  of  lists  does  not  care  about  internal  representation. 

Syspal  provides  the  abstractions  ARRAY,  RECORD,  INTEGER,  CHAR  and  BOOL 
for  direct  use.  And  yet,  one  is  not  concerned  with  the  implementation  of 
such  things;  one  merely  wants  to  use  them,  often,  as  here,  to  bulLd  other, 
more  complicated  abstractions. 

(In  addition  to  the  languages  mentioned  above  taking  a  view  of  objects 
slmlLar  to  mine.  Hydra  [36]  has  a  similar  view  of  objects  which  are  to  be 
stored  for  long  periods  of  time.  Again,  objects  are  abstract  (and 
explicitly  extensible).  There  are  other  similarities  between  the  Hydra 
view  pf  objects  and  mine;  these  will  be  mentioned  later,  as  appropriate.) 

More  details  on  abstract  data  types  can  be  found  in  the  previously 
cited  references  on  CLU  and  Alphard. 
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I. E.  Related  Work. 

The  work  which  has  moat  influenced  my  thinking  about  object  management 
has  been  the  research  on  data  abstractions.  Much  of  this  work  has  its 
origins  in  SIMULA.  [6].  Parnas  describes  abstraction  techniques  [23];  CLU, 
Aiphard,  and  Syspal  all  embody  these  concepts.  It  was  the  desire  to  store 
objects,  rather  than  files,  and  to  view  storage  as  a  collection  of  abstract 
data,  rather  than  as  bit  or  byte  strings,  which  motivated  this  research. 

The  file  systems  of  Honeywell's  Multics  [15],  Bell  Labs'  Unix  [26,  29, 
32],  and  Hewlett-Packard's  MPE/3000  [13]  helped  me  determine  the 
characteristics  of  the  objects  stored  in  a  computing  system.  The  naming 
structure  is  derived  directly  from  Multics.  Hydra's  file  system  [36]  views 
objects  in  a  manner  similar  to  that  presented  here. 

Much  of  my  thoughts  on  protection  also  were  influenced  by  Multics.  The 
capability-based  schemes  described  by  Wuif  (Hydra,  [36]),  Lampson  and 
Sturgis  (Cal,  [19]),  and  Saltzer  [28]  provided  an  interesting  alternative 
to  the  Multics  Access  Control  List  (also  described  in  [28],  and  in  [15]). 

Various  mechanisms  have  been  developed  for  version  maintenance.  Most 
of  them  simply  store  the  object  as  a  linear  sequence  of  complete  versions 
(for  example,  TENEX  [7],  ITS  [9],  and  0S/VS1  [16,  17]).  The  Source  Code 
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Control  System  (SCCS)  [5,  11,  27],  part  of  Unix's  Programmer's  WorkBench 
[8,  18,  29] ,  Implements  a  novel  way  of  maintaining  versions  as  a  set  of 
updates.  SCCS  also  allows  a  (limited)  hierarchy  of  versions.  The  scheme  I 
propose  is  an  immediate  extension  of  that  embodied  in  SCCS. 

I.F.  Plan  for  the  Remainder  of  this  Presentation. 

In  the  following  chapters,  I  describe  "Catoan"  (pronounced  ku-ton'  (t)), 
an  object-oriented  filing  system  for  large,  multi-user  computing 
systems.  Chapter  Two  describes  previous  work  which  Influenced  my  thinking, 
especially  about  those  attributes  which  are  common  to  all  permanently 
stored  objects  in  a  computing  system  such  as  the  one  I  assume.  In  Chapter 
Three,  my  view  of  a  "basic"  object  is  developed,  followed  by  a  discussion 
of  a  "refined"  object  and  a  "versioned"  object.  In  Chapter  Four,  I  present 
an  example  of  how  one  might  use  Catoan  to  store  a  Syspal  program.  Chapter 
Five  examines  the  problems  which  arise  when  other  filing  systems,  and, 
therefore,  other  naming  schemes  and  spaces,  are  allowed  to  co-exist  with 
Catoan.  The  final  chapter,  Chapter  Six,  contains  an  evaluation  of  Catoan, 
and  describes  areas  where  further  research  is  needed. 


(1)  Notation  from  Webster*  s  New  World  Dictionary  of  the  American  Language . 
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In  this  chapter,  I  shall  discuss  previous  work  which  had  a  large 
influence  on  my  research  and  thinking.  The  systems  discussed  here  were 
studied  as  examples  of  ways  to  manage  particular  kinds  of  objects. 

The  typical  kind  of  object  in  each  of  these  systems  is  the  "classical 
file,"  often  appearing  under  different  names  (such  as  "segment").  A 
"classical  file"  is  presented  to  the  user  as  a  string  or  stream  of  bits  or 
characters.  It  does  not  have  any  structure,  save  in  the  way  in  which  it  is 
Interpreted  by  the  user.  Usually,  files  are  stored  as  blocks  of  contiguous 
bits,  along  with  some  system  overhead  information. 

Sample  representations  of  the  files  in  Multics  and  MPE/3000  will  be 
described  using  Syspal  notation. 
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II. A.  Honeywell' 3  Mu I tics . 

The  Multics  file  system  is  described  abstractly  by  Saltzer  [28],  and 
concretely  in  the  Multics  Programmer' 3  Manual  [15].  Here,  those  features 
which  most  influenced  this  work  are  described. 

There  are  two  major  kinds  of  objects  in  the  Multics  file  system: 
"directories"  and  "segments."  Directories  contain  mappings  of 
character-string  names  to  object  references  (unique  identifiers);  the 
objects  can  be  either  segments  or  other  directories.  Segments  contain  the 
data  stored  in  the  system.  In  addition  to  directories  and  segments,  there 
are  also  "links"  and  "mult l- segment  files";  these  will  be  discussed  only 
brie  fly. 

The  objects  in  the  Multics  file  system  are  arranged  in  a  hierarchical 
fashion,  starting  from  a  directory  called  the  "ROOT."  Directories  can  be 
either  nodes  or  leaves  (generally,  they  are  nodes;  only  an  empty  directory 
can  be  a  leaf);  segments  must  be  leaves.  Any  object  in  the  hierarchy  can 
be  named  directly,  by  specifying  the  names  of  all  the  containing 
directories  in  order,  starting  from  the  ROOT.  For  example,  the  payroll  for 
the  month  of  June  might  be  specified,  using  "~"  as  a  name  separator. 
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"ROOT~Accountlng~payrolls~June"  (assuming  that  the  payroll  function  Is  part 
of  the  accounting  department). 

In  addition  to  specifying  a  fully-quallf ted  name  (like  that  In  the 
previous  paragraph),  local  names  are  allowed,  with  the  system  automatically 
supplying  the  higher  levels  of  qualification.  This  requires  a  slight 
change  In  the  form  of  fully-quallf led ,  or  global,  names:  If  the  search  for 
an  object  Is  to  start  at  the  ROOT,  the  first  component  of  the  name  Is  not 
supplied,  thereby  beginning  the  name  with  the  separator  character. 
Therefore,  the  above  example  would  become  "~Accountlng'~payrolls'\June";  a 
user  executing  In  the  "Accounting"  (beneath  the  ROOT)  directory  could 
reference  the  same  segment  with  "payrolls~June,"  and  someone  In  the 
"Account lng~payrol is"  directory  (again,  beneath  the  ROOT)  could  use  simply 
"June." 

Each  object  In  the  file  system  has  some  system  Information  associated 
with  it.  Some  of  this  Information  Is  part  of  all  the  types  of  objects; 
some  of  It  Is  object-type  particular.  An  example  representation  of  a 
Multlcs  directory,  segment,  and  link  appear  In  Figure  l.  The  most 
Interesting  parts  of  this  Information  concern  protection  and  sharing:  the 
"access_control_llst"  and  "rlng_brackets."  The  access  control  list 
specifies  the  types  of  access  granted  to  each  user  In  the  system. 

Directory  access  types  are  search  (look  In  the  directory),  modify  (change 
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entries  In  the  directory),  and  append  (add  entries  to  the  directory); 
segment  access  types  are  read  (get  the  contents  of  the  segment),  execute 
(Interpret  the  segment  as  a  program),  and  write  (change  the  contents  of  the 
segment).  The  rlng_brackets  specify  the  position  In  the  system's  protection 
rings  (an  extension  of  the  supervisor-user  mode  concept;  see  [28])  In 
which  the  object  can  be  accessed. 

The  Multlcs  file  system  Implements  a  strict  hierarchy;  therefore,  each 
object  In  the  system  has  exactly  one  parent  (l),  though  directories  can 
have  multiple  children.  To  allow  an  object  to  appear  to  exist  In  more  than 
one  directory,  Multlcs  provides  "links".  A  link  Is  a  mapping  of  a  local 
(one  component)  name  to  a  global  (fully  qualified)  name.  Returning  to  the 
above  payroll  example,  assume  that  top-level  management  wanted  to  access 
the  payroll  files,  and  desired  to  do  so  directly,  rather  than  through  the 
entire  ~Accountlng~payrolls~June  name.  A  link  might  be  created  In  the 
CorpMgt  directory  called  "JunePay,"  which  would  be  mapped  Into  the  name 
"~Accountlng~payr  oils'*  June." 

An  Important  point  about  Multlcs  links  Is  that  they  map  local  names  to 
global  names,  not  local  names  to  object  references.  Such  links  are  called 


(l)  This  Is  true  for  all  objects  In  the  system  except  the  ROOT,  which  has 
no  parent. 
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"soft"  links  (1);  their  resolution  is  a  two-step  process:  resolving  the 
local  name  to  a  global  name,  and  then  resolving  the  global  name  to  a  unique 
internal  identifier  (segment  number).  This  position  need  not  be  taken; 
Unix,  for  example,  links  local  names  directly  to  object  references  (see 
Section  IlI.C.l). 

A  multi-segment  file  allows  more  than  one  segment's  worth  of  data  in 
one  object  (segments  have  a  limited  size).  A  multi-segment  file  appears  to 
be  very  similar  to  a  normal  segment,  though  it  is  Implemented  as  a 
directory,  with  the  segments  comprising  the  multi-segment  file  as  children 
of  the  directory. 


(1)  A  "hard"  link  maps  a  local  name  directly  to  an  object  reference. 
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multlcs  jiirec to ry  -  TYPE  RECORD 

(*  Defined  types  (such  as  ACCESS_ID)  are  shown  In  Figure  lc.  *) 
access  j:lass:  strlng(32);  !Eg.  Classified,  Top  Secret, 

access  j:ontrol_llst:  ARRAY(*)  OF  RECORD 

Id:  accessed;  'Principal  Identifier 

inodes:  RECORD 

(s,  m,  a):  BOOL;  ! Search,  Modify,  Append 

END ;  [modes 

END;  ! access  j:ontrol_llst 
author:  access_ld; 

current_length:  INTEGER;  ! Number  of  pages. 

( date  time jiuraped , 
date_tlme_entry_modlf led, 
date_tlme_mod If led , 
date_t lme_salvaged, 
date_tlme  jised) :  multlcs  jiate_tlme; 
lnltlal_access_control_llsts:  RECORD 

segment:  LIKE  multlcs jsegment.acl; 
directory:  LIKE  multlcs  jilrectory.acl; 

END;  ! lnltlal_access  jrontrol  JLlst 
multlsegment_flle_lndlcator:  INTEGER;  [Segments  In  multi-segment 

! file;  0  If  not  msf . 

names:  ARRAY (*)  OF  strlng(32);  [Names  of  this  directory, 
quota:  INTEGER;  [Pages  allowed  under  directory. 

records_used :  INTEGER;  [Secondary  storage. 

rlng_brackets:  RECORD  [Rings  of  protection. 

(m_a,  s) :  rings; 

END;  ! rlng_b rackets 

safety_swltch:  BOOL;  [Query  user  upon  DELETE? 

security jaut  of _service_swltch:  BOOL;  [Access  class  discrepancy 

[has  been  detected. 

type:  ARRAY(3)  OF  BOOL  (*segment,  directory,  link*)  :** 

(FALSE,  TRUE,  FALSE); 
unique ^ld:  INTEGER; 

namejnap:  ARR\Y(*)  OF  RECORD  [Segments  under  this  directory, 
name:  strlng(32); 
object:  UNION(@multlcs  jilrectory, 

9raultlcs_segment, 

Gmultlcs^llnk) ; 

END;  [namejnap 
END;  [multlcs  jilrectory 


Figure  la:  Sample  Representation  of  a  Multlcs  Directory. 
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caultics^segment  =  TYPE  RECORD 

access_class:  string(32);  !Eg.  Classified,  Top  Secret, 

access jcontrol_list:  ARRAY(*)  OF  RECORD 

id:  accessed;  [Principal  identifier 

modes:  RECORD 

(r,  e,  w):  BOOL;  [Read,  Execute,  Write. 

END;  imodes 

END;  ! acces9_control_list 
author:  access _id; 
bit_count:  INTEGER; 

blt_count_author :  access_id;  [Principal  who  last  set  BIT  COUNT. 
copy_switch:  BOOL;  I  Copy  on  write? 

current _length:  INTEGER;  ! Number  of  pages. 

(date_time_dumped , 
date_time_entry  jnod if led , 
date_tlme_modif led, 
date_time_used) :  multics_date_time; 
max imum _leng  th :  0  To  262144;  !256K  words, 

names:  ARRAY (*)  OF  string(32);  [Names  of  thi9  segment. 
records_used :  INTEGER;  l Secondary  storage, 

ring ^brackets:  RECORD  ! Rings  of  protection. 

(w,  r,  e) :  rings; 

END;  ! ring ^brackets 

safety_switch:  BOOL;  !Query  user  upon  DELETE? 

type:  ARRAY (3)  OF  BOOL  (*segment,  directory,  link*)  :- 
(TRUE,  FALSE,  FALSE); 
unique _id:  INTEGER; 

contents:  ARRAY(262144)  OF  data_word;  1256K  words. 

END;  lmultics_segment 


Figure  lb:  Sample  Representation  of  a  Multics  Segment. 
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multics_link  -  TYPE  RECORD 
author:  access_id; 

(date_time_dumped , 
date  tlme_entry_modified, 
date_time_used) :  multics_date; 
names:  \RRAY(*)  OF  strlng(32);  INames  of  this  link, 
type:  ARR4Y(3)  OF  BOOL  (*segment,  directory,  link*)  :=* 
(FALSE,  FALSE,  TRUE); 
unique _id:  INTEGER; 
linked _to_path:  string(168); 

END;  Fmultlcs^link 


access  _id  -  TYPE  RECORD 
person:  string(l5); 
project:  string(l5); 
instance:  strlng(l); 

END  (*access_id*)  ; 

rings  -  TYPE  DISTINCT  0  TO  7;  JRings  of  protection. 

muitics_date_time  m  TYPE  0  TO  2**64-l ;  JMicroseconds  since 

! January  l,  1901  00:00  GMT. 

data  word  -  TYPE  0  TO  2**36-l; 
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Figure  Ic:  Sample  Representation  of  a  Multics  Link. 


1 1 .  B .  Hewlett-Packard*’  s  MPE/3QQQ . 

I  examined  MPE/3000  file  system  as  an  example  of  a 
"limited-hierarchical"  file  system.  Users  cannot  create  their  own 
directories.  Rather,  the  naming  hierarchy  is  a  fixed  three-level  system: 
file  name  (segment  name),  group _name,  account_name.  The  flLe_name  is  the 
"lowest"  level  name,  the  account_name,  the  "highest."  If  a  higher  level  is 
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specified,  all  Lower  levels  oust  also  be  specified.  There  is  a  very  strict 
rule  for  interpreting  names:  a  one  leveL  name  Is  extended  with  the  current 
group  and  account;  a  two  level  name  is  extended  with  the  current  account. 

A  process  executes  under  exactly  one  account  and  one  group  within  that 
account  for  its  entire  lifetime;  the  notion  of  changing  the  "working 
directory"  of  the  process  does  not  exist. 

Segments  can  be  created  only  in  the  process's  current  group  within  the 
current  account.  Segments  exist  in  exactly  one  place  in  the  hierarchy,  and 
have  exactly  one  name;  neither  soft  nor  hard  links  exist.  To  reference  a 
segment  by  another  name,  it  must  be  renamed  (if  staying  in  the  same  group 
and  account)  or  copied  (in  which  case  it  becomes  an  entirely  new  entity). 

Security  is  specified  in  two  ways:  with  an  aggregate-level  access 
control  list  (called  the  "security  matrix"),  and  with  a  password 
(lockword).  The  latter,  if  required,  must  be  supplied  whenever  the  segment 
is  "opened"  vmade  ready  for  use)  or  deleted.  The  security  matrix  is 
checked  at  times  similar  to  those  when  the  password  is  checked,  and 
specifies  the  types  of  access  various  groups  of  users  are  granted. 

The  access  types  which  can  be  granted  are:  read,  append  (write  at  the 
end  of  the  segment),  write  (anywhere  in  the  segment),  lock  (access  the 
segment  exclusively),  and  execute.  The  groups  are:  any  (anyone  in  the 
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system),  account  user  (anyone  In  the  same  account),  account  librarian  (an 
account  member  deemed  responsible  for  all  the  segments  in  an  account) , 
group  user,  group  librarian,  and  creator.  In  addition,  the  "account 
manager"  (a  user  who  is  responsible  for  administration  of  the  account)  has 
access  to  all  the  segments  in  that  account,  and  the  "system  manager"  (a 
user  who  is  responsible  for  administration  of  all  the  accounts  in  the 
system)  has  access  to  all  segments  in  the  system. 

Figure  2  shows  a  sample  representation  of  a  file  in  the  MPE/3000  file 
system.  This  representation  is  rather  abstract,  and  incomplete  in  detail. 
More  detail  can  be  found  in  [13]. 
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HP 3000  MPE  file  -  TYPE  RECORD 
lab el: “RECORD 

name:  fname;  IFlle  name.  — | 

group:  fname;  IGroup  name.  | —  Full  file  name, 

account:  fname;  ! Account  name.  — | 

creator:  fname; 

lockword:  UNION(null,  fname);  IMust  be  supplied  at  OPEN 

! If  non-NULL. 

secur lty_matr lx:  ARRAY(5)  OF  RECORD  IWho  can  access  file. 

(*  Subscripts:  1  -  read  2  -  append 

3  -  write  4  -  lock.  5  -  execute.  *) 

(any, 

account  user, 
account_llbrarlan, 
group  user , 
group  _lLbrarlan, 

•  creator):  BOOL; 

END  (*securlty*) ; 

secure:  BOOL;  !ls  SECURITY_MATRIX  enforced? 

datecreated :  }ullan_date; 
dateaccessed:  jullanjiate; 
date  mod if led :  Jullan_date; 

fllejiype:  word;  !Type  (eg.  program,  APL  workspace). 

access_f lags :  RECORD  IHow  file  Is  being  accessed. 

store:  BOOL;  IFlle  being  backed-up  to  tape, 

restore:  BOOL;  IFlle  being  recovered  from  tape, 

load:  BOOL;  I  Memory-resident  program  file, 

exclusive:  BOOL;  lOpened  for  exclusive  use. 

END;  I  accesses 
howopen :  RECORD 
write:  BOOL; 
read:  BOOL; 

END;  !how_open 
user_labels_wrltten:  halfwsrd; 
user _labels_max :  halfword; 
maxjrecords:  dbl_word; 
pr ivate_volurae_lnfo:  blt_string(32) ; 
logical_record_slze:  word; 
block_slze:  word; 
last-block_slze:  word; 

\  records  In  file:  dbl  word; 

END;  ! label 

data:  ARRAY ( l  TO  2**47)  OF  CHAR; 

END;  lmpe_flle 
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fname  ■  TYPE  alpha_string(8) ; 
alpha_string( size: ~0  TO  100)  -  TYPE  RECORD 
length:  0  TO  100; 
chart:  Letters; 

charn:  ARRAY ( 2  TO  size)  OF  UNION(letters,  ”0"  TO  "9"); 
END;  ! alpha  string 

letters  -  TYPE  UNI0N("a"  TO  "z",  "A"  TO  "Z"); 
halfword  -  TYPE  0  TO  255; 
word  -  TYPE  0  TO  32767; 
dbl_word  -  TYPE  0  TO  2147482711; 
julian_date  =  TYPE  RECORD 
year:  0  TO  99; 
day:  0  TO  366; 

END;  ljulian  date 


Figure  2:  Sample  Representation  of  an  MPE/3000  File. 


II. C.  Miscellaneous . 

In  addition  to  the  file  systems  of  Multics  and  MPE/3000,  various  other 
flLe  systems  influenced  my  thinking  on  Catoan.  Unix  influenced  my  ideas  on 
'.inks  and  the  structure  of  the  naming  environment  (that  is,  whether  to  use 
a  hierarchy  or  a  network).  Hydra's  form  of  objects  proved  interesting. 
TENEX's  file  system  supplies  a  form  of  version  maintenance,  as  do  those  of 
ITS,  OS/VSl,  and  many  others.  This  section  presents  the  various  systems 
which  were  investigated  and  which  made  some  (at  least  minor)  contributions 
to  this  work. 
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II. C. 1.  Unix. 

The  Unix  file  system  Is  similar  to  the  Multlcs  file  system.  Like 
Multics,  Unix  provides  a  hierarchical  file  system,  with  an  access  control 
list  protection  scheme.  However,  the  hierarchy  Is  not  strict,  and  the  -j* 

access  control  list  Is  more  coarse  than  .hat  of  the  Multlcs  system. 


Like  Multlcs,  Unix  has,  conceptually,  two  types  of  objects:  directories 
and  segments  (files).  However,  unlike  Multlcs,  Unix  segments  can  have 
multiple  parents.  Also,  links  In  Unix  are  "hard"  links  (those  In  Multlcs 
are  called  "soft").  The  local  name  Is  translated  directly  to  a  unique 
Identifier  (segment  number  —  "l-node"  In  Unix  terminology),  without  the 
intervening  global  name.  This  Is  a  more  efficient  form  of  link  (it  skips 
the  additional  name  resolution  step  (l)  when  following  the  link),  but  that 
is  relatively  unimportant.  Soft  links  provide  greater  indirection 
facilities  than  do  hard  links  (because  they  can  be  bound  to  another  link). 
Hard  links,  though,  provide  a  known  Interpretation  of  a  link,  and  make  It 
easier  for  the  owner  of  a  segment  to  determine  all  the  people  using  It. 
Implementing  a  complete  cross-reference  with  soft  links,  for  example,  would 
require  that  the  link  be  completely  traced  when  It  was  created;  In  a  hard 
link  system,  the  link  Is  directly  resolved. 


(1)  Or  steps:  a  soft  link  can  bind  a  locai  name  to  another  soft  link. 
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Although  Unix  segments  can  have  multiple  parents  (can  be  contained  in 
multiple  directories),  directories  cannot.  This  precludes  building  a 
general  network  in  the  Unix  file  system. 

The  protection  scheme  in  Unix  allows  the  object  owner  to  specify  access 
for  certain  groups  of  users,  rather  than  on  a  user-by-user  basis.  The 
scheme  is  tied  to  the  accounting  system,  with  access  being  granted  to  the 
owner,  to  members  of  the  owner's  project  (account),  and  to  all  users  In  the 
system.  See  [26,  29,  32]  for  more  details. 


II. C. 2.  Hydra. 

The  Hydra  file  system  [l,  13]  stores  Hydra-objects,  which  are 
pseudo-abstract,  and  are  each  of  a  particular  type  or  type  extension.  Each 
Hydra-object  (call  one  "CRL")  has  two  parts:  the  data  part,  and  the 
"c-llst."  The  actual  data  in  CRL  is  stored  in  the  data  part.  The  c-list 
contains  references  to  Hydra-objects  which  are  contained  in  CRL.  Every 
object  in  Hydra  has  both  parts. 

Because  each  Hydra-object  has  both  a  data  and  a  c-list  part,  there  Is 
need  for  only  one  kind  of  object,  which  can  function  as  both  a  "segment" 
and  a  "directory."  However,  one  other  important  reason  for  including  both 
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parts  Ln  all  objects  Is  that  references  to  other  objects  cannot  exist  In 
the  data  part,  but  only  In  the  c-llst. 

II. C. 3.  Version  Maintenance:  TOPS-20,  ITS,  0S/VS1,  and  SCCS. 

Version  maintenance  has  been  a  topic  of  Interest  for  some  time. 

TOPS-20  [7],  ITS  (l)  [9],  and  0S/VS1  [16,  17]  all  provide  similar  forms  of 
version  maintenance.  All  three  systems  store  each  version  ln  Its  entirety 
(as  opposed  to  storing  updates  relative  to  some  base  version).  Versions  In 
TOPS-20  and  ITS  are  linear,  time-ordered  sequences,  referenced  by  numbers 
which  increase  from  older  to  newer  (more  recent)  versions.  The  default 
version  (the  version  obtained  If  none  Is  explicitly  specified)  is  always 
the  most  recent  version.  The  symbol  ">"  in  ITS,  and  the  (special)  version 
number  0  in  TOPS-20,  reference  the  latest  version  on  read  and  create  a  new 
version  on  write.  The  symbol  "<"  in  ITS  and  the  (special)  version  number 
"-2"  in  TOPS-20  access  the  oldest  version. 

OS/VSl's  version  naming  scheme  differs  from  that  of  TOPS-20  and  ITS. 

It  Is  a  two-level  system,  allowing  both  a  "generation"  and  a  "version" 
specification.  The  specification  becomes  a  suffix  of  "GnnnnVmm"  to  the 
regular  file  name,  where  "nnnn"  is  the  "generation  number"  and  "mm"  Is  the 


(l)  ITS  Is  an  operating  system  developed  at  MIT  for  the  PDP-10  family  of 
computers. 
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"version  number."  This  provides  a  limited  tree-structure  for  version 
naming:  generation  within  the  file,  and  version  within  the  generation.  The 
suffix  "(0)"  references  the  latest  generation;  "(-H)"  creates  a  new 
generation;  "(-1)"  references  the  previous  generation,  and  "(-n)" 
references  the  nth  previous  generation.  The  automatic  version  maintenance 
system  does  not  use  the  version  field;  it  can  be  accessed  directly  by  the 
user,  however. 

The  Programmer's  WorkBench  under  Unix  provides  a  facility  called  the 
Source  Code  Control  System  [5,  11,  27]  for  version  maintenance.  SCCS 
allows  versions  to  be  arranged  in  a  hierarchy,  with  the  names  representing 
a  derivation  sequence.  Versions  are  stored  as  sets  of  updates  to  the 
previous  version.  I  shall  discuss  SCCS  further  in  Section  IIl.D,  "A 
Versioned  Object." 

II. D.  Summary. 

In  this  chapter,  I  have  discussed  various  existing  systems  which 
significantly  Influenced  the  research  presented  in  the  following  chapters 
of  this  report.  The  file  systems  of  Honeywell's  Multlcs  and  of 
Hewlett-Packard's  MPE/3000  were  described,  with  an  examination  of  their 
abstract  file  structures.  The  Unix  file  system  Is  very  similar  to  that  of 
Multlcs,  except  that  a  segment  can  be  contained  In  more  than  one  directory. 
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The  structure  of  the  Hydra  file  system  was 
structure  of  the  objects  stored.  Finally, 
systems  were  described.  Including  T0PS-20, 
Control  System. 


also  discussed,  especially  the 
existing  version  maintenance 
ITS,  OS/VSl,  and  the  Source  Code 
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DEFINITION  OF  4  "CAT04N-QBJECT" 

In  this  chapter,  I  describe  the  objects  managed  by  Catoan.  First,  the 
"basic"  object.  Its  characteristics,  Its  operations,  and  Its  representation 
will  be  defined.  Then,  a  "refined"  object,  whose  operations  are  less 
primitive  than  those  of  the  basic  object,  will  be  presented.  Lastly, 
objects  which  have  explicit  versions  (such  as  programs)  will  be  described. 
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III. A.  Issues :  Containment  and  Trust. 

As  will  be  shown  later  in  this  chapter,  there  are  three  ways  to  put 
data  in  a  Catoan-ob j ec t;  all  of  them  are  different,  all  have  different 
semantics  and  characteristics.  But,  why  three  ways? 

In  Chapter  One,  I  wrote  that  "multiple  filing  systems  could  exist. " 
Furthermore,  "the  filing  system  (object  manager)  need  not  be  trusted  to  not 
modify  or  leak  data."  Both  of  these  issues  involve  trust:  need  one  trust 
the  filing  system,  and,  if  not,  what  can  be  done  about  it? 

What  does  it  mean  to  "contain"  something?  What  does  it  mean  to  "tru9t" 
something?  In  this  introductory  section,  I  shall  explore  these  ideas  as 
they  relate  to  Catoan.  Some  of  the  issues  I  shall  raise  may  not  be  clear 
until  later  in  the  chapter;  I  think  this  is  better  than  delaying  their 
discussion,  however. 

First,  though,  a  little  groundwork  must  be  laid.  The  unit  of  storage 
in  Catoan  is  the  "Catoan-ob ject";  let  a  typical  Catoan-object  be  called 
"CRL. "  In  data  abstraction  terminology,  Catoan  implements  the  abstraction 
"Catoan-object."  Catoan-ob j ects  can  "contain"  other  Catoan-objects,  and 
other  kinds  of  abstractions,  too.  Each  Catoan-object  has  a  DIRECTORY  and  a 
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CONTENTS;  the  things  one  normally  puts  in  each  of  these  is  different,  and 
things  are  put  in  them  for  different  reasons,  39  will  be  explained. 

III. A. 1.  Containment  and  Catoan. 

What  does  it  mean  for  a  Catoan-object  to  "contain"  another 
Catoan-object.  What  does  it  mean  for  a  Catoan-object  to  "contain"  any  kLnd 
of  (abstract)  object? 

Each  Catoan-object  (such  as  CRL)  has  a  "CONTENTS,"  which  specifies  the 
abstract  object  which  is  the  data  of  CRL.  This  is  one  form  of 
"containment":  containment  in  the  CONTENTS.  The  primary  reason  for 
creating  a  Catoan-ob j ect  is  to  provide  a  means  for  permanently  storing, 
referencing,  and  naming  the  data  of  the  CONTENTS. 

The  data  which  the  object  contains  —  its  CONTENTS  —  should  be  readily 
accessible.  It  should  be  easy  to  read,  easy  to  set,  and  easy  to  change  the 
CONTENTS.  The  CONTENTS  could  be  used  to  hold  the  text  of  a  letter  which 
was  scored  in  a  computing  system  which  implemented  Catoan. 

In  addition  to  a  CONTENTS,  Catoan-obj ects  have  a  DIRECTORY.  The 
DIRECTORY  has  two  parts:  a  "named"  part,  and  an  "unnamed"  part.  In  the 
named  DIRECTORY  of  CRL,  one  would  store  references  (hard  links)  to  Chosa 
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Catoan-ob jects  considered  to  be  sub-objects  of  CRL.  This  Is  usually  a 
logical  grouping,  and  can  be  thought  of  as  placing  a  segment  In  a  certain 
directory  In  a  Multlcs  or  Unix  file  system.  The  sub-objects  of  CRL  are 
Catoan-objects  In  their  own  rights;  changing  their  relatlonsl  Ip  with  CRL 
(that  Is,  the  exact  sub-object  to  which  a  particular  name  refers)  usually 
19  not  done. 

In  the  unnamed  part  of  CRL' s  DIRECTORY  are  references  (hard  links,  but 
without  local  names  attached  to  the  reference)  to  Catoan-objects  which  are 
physical  sub-objects  of  CRL.  Those  Catoan-objects  referenced  In  the 
unnamed  part  of  the  DIRECTORY  are  part  of  the  Implementation  of  the 
particular  Catoan-obj ect,  and  are  not  usually  of  interest  to  the  object's 
users.  As  with  objects  referenced  In  the  named  part  of  the  DIRECTORY,  the 
relationship  between  CRL  and  the  sub-objects  In  the  unnamed  part  of  the 
DIRECTORY  usually  Ls  not  changed. 

The  objects  In  the  DIRECTORY  of  a  Catoan-object  are  considered  less 
accessible  than  the  CONTENTS.  Once  a  reference  to  an  object  19  added  to 
the  DIRECTORY,  It  cannot  be  replaced,  but  must  be  deleted  and  then  added. 
This  reflects  the  accessibility  semantics  of  such  an  inclusion.  If  these 
semantics  are  not  appropriate  for  a  particular  application,  the  CONTENTS 
could  be  used  to  Implement  a  directory  which  Is  Interpreted  by  some 
program.  Because  the  CONTENTS  of  a  Catoan-object  can  be  an  arbitrary 
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abstract  object,  the  DIRECTORY  portion  of  a  Catoan-ob j ect  can  be  ignored, 
and  the  CONTENTS  used  to  Implement  a  filing  system  which  Is  more  natural 
for  the  particular  application. 

All  three  of  the  forms  of  including  data  In  an  object  might  be  used  to 
represent  a  system  composed  of  a  collection  of  programs  (l).  The 
highest- level  module  In  the  system  is  a  program,  with  the  source  stored  In 
the  CONTENTS,  representing  the  view  of  the  source  as  the  abstract  program. 
In  the  unnamed  portion  of  the  DIRECTORY  would  be  the  implementation  of  the 
program  object.  Including  such  things  as  documentation  and  object-code. 
Named  references  to  the  programs  comprising  the  system  would  be  In  the 
named  portion  of  the  DIRECTORY.  Chapter  Four,  "An  Example:  A  Syspal 
Program  Object,"  describes  the  aspects  of  this  example  relating  to  programs 
In  more  detail. 

III. A. 2.  Trust  and  Catoan. 

What  does  It  mean  to  "trust"  a  non-sentient  entity?  What  does  It  mean 
to  "trust"  a  filing  system?  What  does  It  mean  to  "trust"  Catoan? 

"Trust"  In  general  Is  very  difficult  to  define,  especially  when  applied 
to  non-sentlent  entitles.  However,  "trusting"  a  filing  system  is  easier  to 


(l)  I  shalL  return  to  this  example  throughout  the  chapter. 
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define.  In  this  report,  to  trust  a  filing  system  Is  to  give  the  filing 
system  access  to  data  when  it  doesn't  explicitly  require  such  access  to 
perform  Its  duties.  My  perception  of  a  filing  system's  duties  does  not 
Include  access  to  the  CONTENTS  (as  defined  In  the  previous  section). 

Rather,  a  filing  system  is  a  manager  for  named,  permanent  objects  —  not 
the  CONTENTS  of  those  objects. 

A  "trusted"  module  is  a  module  which  a)  the  user  believes  Is  secure, 
and  will  not  access  things  except  on  explicit  instructions  from  the  user, 
and  b)  does  not  allow  other  users  to  access  It,  except  as  Is  appropriate 
foe  that  user.  Part  (a)  Is  primarily  a  belief  on  the  part  of  the  user; 
part  (b)  has  some  Implications  on  the  kind  of  Information  which  the  trusted 
module  can  supply  to  environments  out3ide  the  module. 

Specifically,  a  trusted  module,  in  order  to  prevent  other  entities  from 
accessing  its  protected  data,  cannot  give  out  any  references  to  any  portion 
of  the  protected  data's  internal  representation.  Rather,  it  must  give  out 
an  Indirect  reference,  which  the  trusted  module,  and  only  the  trusted 
module,  can  translate  Into  the  actual  representation  of  the  protected  data. 

Catoan,  however,  gives  out  a  pointer  to  portions  of  the  representation 
of  a  Catoan-object.  The  CONTENTS  _READ  operation  (see  Section  IIl.B.l.b) 
returns  a  pointer  to  the  CONTENTS  of  the  Catoan-object.  This  allows 
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entitles  besides  Catoan  to  access  part  of  the  representation  of  the 
Catoan-object. 

Because  a  module  which  gives  out  portions  of  Its  data's  representation 
does  not  have  total  control  over  the  representation,  it  does  not  have  total 
control  over  what  can  be  done  to  the  representation,  and  so  Is  unable  to 
ensure  certain  kinds  of  internal  consistency.  In  the  case  of  Catoan,  for 
example,  the  Information  accessed  by  the  "principals"  and  the  "dates" 
sub-classes  of  operations  may  not  be  accurate.  Furthermore,  Catoan  has  no 
way  of  verifying  the  Identities  of  those  accessing  the  data  in  the  CONTENTS 
of  the  Catoan-object,  because  they  may  be  accessing  the  data  without  using 
Catoan . 

This  report  examines  some  of  the  Implications  of  not  trusting  the 
filing  system.  The  filing  system  will  have  access  to  Its  objects 
(Catoan-objects) ,  but  not  to  the  data  in  the  CONTENTS  of  the  Catoan-object. 
This  Is  done  partly  out  of  lack  of  trust,  and  partly  to  allow  more  than  one 
filing  system  to  exist  In  the  host  computing  system  more  easily. 

tl l. B.  The  Basic  Object. 

An  OBJECT  ("Catoan-object")  is  the  basic  unit  of  data  In  Catoan. 
Catoan-objects  conceptually  have  three  parts:  SYSTEM  OVERHEAD  INFORMATION, 
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a  DIRECTORY,  and  a  CONTENTS.  The  first  is  information  the  system  keeps 
about  each  object,  such  as  when  it  was  created.  The  DIRECTORY  and  CONTENTS 
were  described  in  Section  III. A. 1  above. 

Figure  3  shows  the  operations  and  representation  of  a  Catoan-object. 
Many  points  in  the  figure  and  the  Immediately  ensuing  discussion  may  be 
unclear.  Subsequent  sub-sections  In  this  section  will  clarify  the 
problems . 

Most  of  the  operations  on  an  object  are  related  to  the  "SYSTEM  OVERHEAD 
INFORMATION"  in  the  object.  There  are  only  eight  operations  dealing  with 
the  DIRECTORY,  and  only  two  with  the  CONTENTS.  Yet,  these  two  parts  of  an 
object  are  the  most  interesting.  The  SYSTEM  OVERHEAD  INFORMATION  is  very 
structured,  and  has  a  very  limited  scope;  we  know  the  form  It  will  take 
long  before  the  object  is  actually  defined.  The  DIRECTORY,  on  the  other 
hand,  may  change  drastically  during  the  existence  of  the  object  —  it  may 
start  off  empty,  have  some  objects  added  to  it,  have  some  objects  deleted 
from  it,  and  will  have  an  unpredictable  size.  Similarly,  the  structure  and 
size  of  the  CONTENTS  is  unpredictable,  and  the  structure  might  never  be 
known  to  Catoan. 


3  .  . 


Chapter  Three 

Catoan 

Object  Definition 

Section  8 

Basic  Object 

MODULE  catoan_ob j ect(contents_type) ; 

new: PROCEDURE 

RETURNS(o:9catoan_obj  ect) 

(*  Make  a  new  catoan_ob j ect.  *) ; 
deLete: PROCEDURE 

(*  Delete  a  catoan_obj ect.  *) ; 

contents  _set :  PROCEDURES  :@con  ten  t9_type) 

(*  Stow  the  contents  of  the  object.  *) ; 
contents  read : PROCEDURE 

RETURNS  (c :  !?contents_type) 

EXCEPT ION (con ten ts_doesnt_ex 1st) 

(*  Retrieve  this  object's  contents.  *); 

dlrec tory_unnamed_add : PROCEDURE ( o: ?catoan_ob j  ec t ,  n: INTEGER) 

EXCEPTION ( dlrec tory_full,  dlrectory_slot_occupted) 

(*  This  object  now  Includes  unnamed  object  number  N.  *) ; 
dlrec  tory  unnamed  delete: PROCEDURES:  INTEGER) 

EXCEPT  ION ( dir ec tory _doesnt_ex 1st 

d Irec to ry_doesnt_con tain  object) 

(*  Remove  Nth  entry  from  unnamed  portion  of  DIRECTORY.  *) 
directory  _unnamed  _lookup : PROCEDURE ( n : INT  EGER) 

RETURNS  (o:  !?catoan_ob  j  ect) 

EXCEPT  ION (dlrec tory _doesnt_ex 1st 

dlrectory_doesnt_contaln_obj  ect) 

(*  Return  Nth  entry  from  unnamed  portion  of  DIRECTORY.  *) 
dlrec  tor  y_named_add :  PROCEDURES  :ob  j  ec t_name,  o:?catoan_ob  j  ect) 

EXCEPT ION( dlrec tory _full ,  dlrec tory _slot _occupled) 

(*  This  object  now  Includes  another  named  object.  *) ; 
dlrec  tory  named  ^delete : PROCEDURE (n : ob  j  ec  t_name) 

EXCEPT  ION (dlrec tory _doesnt_ex 1st , 

directory  doesnt_contaln_ob  j  ect) 

(*  This  object  no  longer  Includes  a  certain  object.  *) ; 
directory  named  _conta Ins : PROCEDURE(n:ob j  ec  t_name) 

RETURNS (b: boolean) 

EXCEPTION (dlrec tory _doesnt_ex 1st) 

(*  f*">es  this  object  contain  object  'n'?  *) ; 
directory  >  '»med_lookup:PROCEDURE(n:object_narae) 

RETURNS(o. ?catoan_ob j ect) 

EXCEPTION ( dlrec tory _doesnt_ex 1st , 

dlrectory_doesnt _contaln_oo j  ec t) 

(*  Translates  a  contained  oblect-name  Into  an  object  reference.  *) 
directory  nam »d _read : PROCEDURE 

RETURNS(n: \RR\Y (*)  of  object_name) 
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EXCEPTION(dlrectory_doesnt_exist) 

(*  Which  objects  does  this  one  contain?  *) ; 

owner  _read : PROCEDURE 

RETURNS ( p : pr inc ipal_id) 

(*  Who  owns  this  module?  [obtained  from  mem  mgrl  *) ; 
creator  set : PROCEDURE (p: principal _id) 

(*  Indicate  that  principal  ' p'  is  object's  creator.  *) ; 
creator  read : PROCEDURE 

RETURNS (p:prlncipal_id) 

(*  Who  created  this  object?  *) ; 
last  jnodif ier_set :PROCEDURE(p:principal_id) 

(*  State  who  last  modified  this  object.  *) ; 
last  mod  if ler  read: PROCEDURE 
RETURNS (p:principal_id) 

(*  Who  last  modified  this  object?  *) ; 

date _crea ted _set :PROCEDURE(d:date) 

EXCEPTION (date  invalid) 

(*  Indicate  when  the  object  was  m->de. 
date_created  read : PROCEDURE 
RETURNS (d: date) 

(*  When  was  this  object  created?  *) ; 
date_last jnodif led  set : PROCEDURE(d:date) 

EXCEPTION  date  "invalid) 

(*  Indicate  when  this  object  was  last  modified.  *) ; 
date  las t  jnod i f led  read : PROCEDURE 
RETURNS  (d:  date)"* 

(*  When  was  this  object  last  modified?  *) ; 
date_last _ac cessed _set: PROCEDURE (d:date) 

EXCEPT ION( date "invalid) 

(*  Indicate  when  this  object  was  last  accessed.  *) ; 
date_last  accessed _read : PROCEDURE 
RETURNS  (d:  date)" 

(*  When  was  this  object  last  accessed?  *) ; 

size  _read : PROCEDURE 
RETURNS(s: Integer) 

(*  How  big  is  this  object? 

[ over head Pmera _mgr. size ( CONTENTS) -Hnemjngr. s ize( DIRECTORY) I  *) ; 
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object_name  =■  string(20); 
principal_id  =*  strlng(20); 

date  =»  RECORD 

year:  1975  TO  3975;  lassumption:  system  will  last  <2000  years 

month:  1  TO  12; 

day:  1  TO  31 , 

hour:  0  TO  23, 

minute:  0  TO  59, 

second:  0  TO  59.9999  PRECISION  4 
END;  Idate 


SELF  -  RECORD  1  Representation  of  an  object, 
contents  =  @contents_type ; 

(date_created , 
date_last  jnodif led, 
date^last  accessed)  «  date; 

(creator, 
last  modifier, 
owner)  =*  principal_id; 
directory  =»  RECORD 

named:  ARRAY (*)  OF  RECORD 
n:  object_name; 
o:  ?catoan  _ob j ect; 

END;  ! named 

unnamed:  ARRAY(*)  OF  @catoan_ob j ect; 
END;  Idirectory 
END;  ! SELF 


END  MODULE;  !catoan_obj ect 


Figure  3:  The  Basic  Catoan-Obj ect. 


Assume  that  the  CONTENTS  of  a  Catoan-obj ec t  holding  a  Syspal  program  is 
to  be  of  type  "text."  Figure  4  shows  how  CRL  would  be  declared,  and  how 
one  would  store  and  retrieve  Lts  CONTENTS. 
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text  -  TYPE  .  .  .  ; 
edit_buffer:  text; 
crL:  catoan  object; 


crl  :*  NEW  catoan_obj ect( text) ; 


edit  buffer  :»  contents  read(crL); 


contents  setCcrl,  edlt_buf fer) ; 


Figure  4:  Catoan-Obj ect  with  CONTENTS  of  Type  "text." 


III.B.l.  The  Operations  of  the  Basic  Object. 

The  operations  on  an  object  can  be  classified  according  to  the 
Information  they  reference.  The  classes  of  operations  are  overhead: 
Instance,  principals,  dates,  miscellaneous;  contents;  and  directory.  Each 
class  will  be  considered  below. 

III.B.l. a.  Overhead-Class  Operations:  Instance. 

The  "Instance"  operations  are  NEW  and  DELETE.  These  operations  are 
Invoked  whenever  a  Catoan-obj ect  Is  created  or  deleted.  NEW  sets  up  the 
Initial  contents  of  the  overhead  Information,  and  initializes  the  DIRECTORY 
and  CONTENTS  to  be  empty  (NULL).  DELETE  passes  a  message  to  each  of  the 


-52- 


Chapter  Three 
Section  B 


Catoan 


Object  Definition 
Basic  Object 


Catoan-objects  referenced  In  the  DIRECTORY  and  to  the  object  referenced  in 
the  CONTENTS  Indicating  that  they  are  no  longer  referenced  by  CRL,  and 
deletes  CRL. 


III.B.L.b.  CONTENTS-Class  Operations. 

The  "contents"  operations  are  CONTENTS_SET  and  CONTENTS  _READ.  They 
deposit  data  Into,  and  extract  data  from,  a  Catoan-ob j ect' s  CONTENTS.  The 
argument  to  SET  (L)  and  the  return  value  from  _READ  are  pointers  to  the 
type  of  the  CONTENTS,  as  specified  by  the  module  parameter  ( "contents_type" 
In  Figure  3)  when  the  Catoan-ob j ec t  was  Instantiated  by  NEW.  (For  example, 
If  CRL  Is  defined  as  In  Figure  4,  _SET  takes  and  _READ  returns  something  of 
type  0TEXT. ) 

The  effects  of  _SET  and  _READ  are  to  translate  between  Catoan-ob j ec t 
references  and  Syspal-object  references.  Notice  that  both  operations  work 
with  pointers,  and  not  directly  with  the  data.  The  _READ  operation  Is 
analogous  to  the  OPEN  operation  In  a  classical  file  system;  the  _SET 
operation  Is  analogous  to  CLOSE. 


(1)  This  is  a  shorthand  notation  for  " CONTENTS _S ET. "  When  there  will  be  no 
confusion  as  to  the  meaning  and  context,  the  prefix  (portion  of  the  name 
before  the  "  ")  will  be  omitted,  A  similar  convention  will  be  used  for 
eliding  suffixes. 
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The  _SET  operation  must  ensure  that  the  Catoan-ob j ect  and  Its 
components  are  safely  stored  In  non-volatile  storage.  Hopefully,  part  of 
the  Interface  of  the  memory  manager  Is  an  operation  like  MAKE _NON  VOLATILE, 
which.  If  all  memory  Is  non-volatile,  and  there  Is  no  buffering  in  volatile 
memory  (by  the  memory  manager) ,  may  be  a  null  operation.  Similarly,  READ 
might  "stage"  some  part  of  the  contents  by  calling  the  memory  manager's 
PRIME_BUFFER  operation. 

Ilt.B.l.c.  DIRECTORY -Class  Operations. 


The  DIRECTORY  of  CRL  specifies  those  Catoan-obj ects  which  are 
sub-objects  of  CRL.  There  are  two  parts  to  the  DIRECTORY  of  a 
Catoan-obj ect:  the  named  part,  and  the  unnamed  part.  The  two  parts 
represent  different  logical  relationships  between  CRL  and  Its  sub-objects. 
The  DIRECTORY  Is  described  In  Section  ill. A. 1. 


The  unnamed  portion  of  the  DIRECTORY  represents  those  Catoan-obj ects 
which  are  Internal  sub-objects  of  CRL.  Generally,  these  are  part  of  the 
Implementation  of  the  abstraction  which  uses  CRL,  and  are  of  no  concern  to 
CRL' s  users.  An  example  of  using  the  unnamed  portion  of  the  DIRECTORY  Is 
shown  In  Chapter  Four,  "An  Example:  A  Syspal  Program  Object,"  where  It  Is 
used  for  (among  other  things)  the  object-code  of  a  program. 
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The  named  portion  of  the  DIRECTORY  represents  Catoan-objects  whLch  the 
user  feels  are  logically  parts  of  CRL.  He  might,  for  example,  build  CRL 
from  several  component  objects,  thereby  forming  one  Catoan-object  from 
several  sub-Catoan-obj ec ts . 

The  operations  on  the  unnamed  portion  of  the  DIRECTORY  are 
DIRECTORY JJNN.\MED_4DD ,  _DELETE,  and  _LOOKUP.  _ADD  Inserts  a  sub-object  In 
the  nth  DIRECTORY  slot;  _DELETE  removes  a  specified  unnamed  entry  from 
the  DIRECTORY.  _LOOKUP  returns  the  object  referenced  by  the  Nth  entry  In 
the  unnamed  portion  of  the  DIRECTORY. 

The  operations  on  the  named  portion  of  the  DIRECTORY  are 
DIRECTORY_NAMED_\DD,  _DELETE,  _CONT\INS,  _LOOKUP,  and  _RE\D.  _ADD 
associates  a  name  and  an  object  reference  in  CRL's  DIRECTORY;  _DELBTE 
removes  such  an  association.  CONTAINS  is  a  predicate  which  indicates 
whether  the  supplied  name  Is  In  the  DIRECTORY;  LOOKUP  translates  a  name 
to  an  object  reference.  _READ  returns  a  matrix  containing  all  the  names  In 
the  DIRECTORY,  and  is  supplied  so  that  a  DIRECTORY  can  be  searched. 

III.B.l.d.  Overhead-Class  Operations:  Principals. 

The  "principal"  operations  obtain  and  manipulate  the 
principal-identifiers  stored  in  the  overhead  portion  of  a  Catoan-object. 

The  identity  of  the  Catoan-ob J ec t' 9  owner  (the  principal  paying  for  the 
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storage) ,  creator,  and  last  modifier  are  accessed  through  the  operations 
OWNER  JEAD,  CREATOR JET  and  JEAD,  and  LAST _MOD I F IE R _S ET  and  JEAD  (l). 

The  creator  and  last-modifier  can  be  changed;  the  owner  Is  obtained  from 
the  memory  management  system. 

IIl.B.l.e.  Overhead-Class  Operations:  Dates. 

The  "date"  operations  provide  access  to  the  time  and  date  when  various 
operations  last  occurred  for  the  Catoan-ob j ect.  Available  are  times  and 
dates  for  the  Catoan-ob j ec t' s  creation,  last  modification,  and  last  access. 
These  operations  are  DATE _CREATED JET  and  JEAD,  DATE_LAST_MODIFIED JET  and 
JEAD,  and  DATEJAST JCCESSED JET  and  JEAD.  The  dates  automatically 
maintained  by  Catoan  are  for  creating,  modifying,  and  accessing  the 
Catoan-ob J ect,  not  the  CONTENTS  of  the  Catoan-ob) ect.  This  Is  related  to 
the  trust  Issue  discussed  in  Section  III. A. 2. 

III.B.l.f.  Overhead-Class  Operations:  Miscellaneous. 

The  "miscellaneous"  operations  provide  information  about  the  physical 
size  of  the  Catoan-ob) ect.  SIZEJEAD  obtains  the  sizes  of  the  CONTENTS, 
DIRECTORY,  and  overhead  from  the  memory  manager,  and  returns  their  sura. 


(1)  Tha  JET  operations  are  generally  not  explicitly  used,  and  exist 
primarily  for  completeness. 
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III.B.2.  Comments  on  the  "_SET"  Operations. 

The  inclusion  of  some  of  the  _SET  operations  (l)  may  be  puzzLing.  For 
example,  why  is  there  a  DATE_MODIFIED_SET  operation?  Won't  Catoan  take 
care  of  such  things? 

Recall  that  Catoan  is  part  of  the  optional  extensions  to  the  kernel 
operating  system.  Furthermore,  Catoan  is  not  necessarily  trusted,  and  it 
is  possible  to  access  portions  of  Catoan-ob j ec ts  (specifically,  the  data  in 
the  CONTENTS)  without  using  Catoan.  A  user  who  directly  accesses  the  data 
in  the  CONTENTS  (for  example)  might  want  to  update  the  SYSTEM  OVERHEAD 
INFORMATION  in  a  containing  Catoan-ob) ect  so  that  It  accurately  reflects 
what  has  happened. 

It  is  possible  that  a  failure  of  the  host  computing  system's  hardware, 
the  operating  system  kernel,  or  Catoan  may  Introduce  errors  into 
Catoan-obj ec ts.  These  errors  may  require  human  intervention.  Even  in  a 
trusted  filing  system  like  that  on  Multics,  the  ability  for  people  to 
access  some  of  the  "overhead”  fields  is  considered  necessary.  In  a 
non-trusted  filing  system,  such  abilities  are  mandatory  so  that  "expected 
errors"  (2)  can  be  corrected. 

(1)  Specifically,  the  CREATOR  ,  LAST  MODIFIER  ,  DATE  CREATED  , 

DAT E _L AST _MOD I F IE D _,  and  DATE ~LAST _ACCES S ED _  3SET  operations? 

(2)  One  of  the  reasons  a  system  might  not  be  trusted  by  lt9  users  is  that 
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III. B. 3.  Naming  and  the  DIRECTORY. 

Each  Catoan-obj ect  contains  a  DIRECTORY  part.  This  DIRECTORY  specifies 
all  those  objects  which  are  sub-objects  of,  for  example,  CRL;  the 
contained  objects  need  not  have  names  associated  with  them,  in  which  case 
they  are  referenced  numerically.  See  Section  III. A. 1  for  a  discussion  and 
example  of  DIRECTORY  use. 

If  one  wanted  to  implement  a  Multics-like  directory,  the  CONTENTS  of 
the  object  would  be  NULL;  for  a  Multics-like  segment,  the  DIRECTORY  would 
be  empty.  But,  one  can  have  a  non-empty  DIRECTORY  and  a  non-empty  CONTENTS 
at  the  same  time,  thereby  allowing  objects  to  "contain"  other  objects. 

Multlcs  has  the  concept  of  a  "soft  link,"  between  a  local  name  and  a 
global  name.  No  such  concept  exists  in  Catoan.  Rather,  because  an  object 
can  be  in  the  DIRECTORies  of  many  objects,  the  same  object  can  be 
referenced  directly  by  many  local  names.  This  is  often  referred  to  as  a 
"hard  link,"  and  is  similar  to  the  Unix  link. 

One  of  the  implications  of  the  unrestricted  DIRECTORY  inclusion  is 
that,  rathec  than  implementing  a  naming  hierarchy,  Catoan  realizes  a  naming 


the  users  expect  the  system  to  make  mistakes  (that  they  can,  perhaps, 
correct) . 
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network.  Just  as  object  A  can  contain  more  than  one  object,  so  can  more 
than  one  object  contain  object  A.  Furthermore,  loops  can  be  created  in  the 
network,  by  A  containing  B  which  contains  A. 

An  advantage  of  this  arbitrary  network  structure  is  that  it  can  more 
readily  reflect  the  structure  of  some  objects.  Recursive  objects  and 
objects  which  include  other  objects  exist  in  the  world;  it  would  be  nice 
if  one  could  model  them  in  a  computing  system.  Such  object  inclusion  also 
aids  in  modularity.  For  example,  if  one  were  implementing  a  network-model 
database,  one  could  define  the  network  parent-child  relationships  using  the 
DIRECTORY  of  each  object  to  contain  the  children. 

Allowing  a  general  network  in  the  naming  structure  presents  a  problem 
only  when  the  entire  naming  network  must  be  walked.  If  it  is  deemed 
Important  to  be  able  to  walk  the  network,  VISITED  flags  must  be  included  In 
each  Catoan-ob ject,  which  must  be  reset  upon  completion  of  the  network 
traversal.  If  such  flags  ARE  Included,  it  may  be  necessary  to  reset  them 
all  upon  system  restart,  to  guard  against  a  failure  during  a  walking  of  the 
network,  and  subsequent  traversals  encountering  a  non-existent  loop  because 
a  VISITED  flag  stayed  set  from  a  previously  aborted  walk.  Various  problems 
besides  system  failure  exist  when  the  network  must  be  walked;  for  example, 
what  If  a  walk  aborts  for  a  reason  other  than  system  failure?  I  shall  not 
discuss  such  problems  here,  but,  rather,  refer  the  interested  reader  to  the 
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literature  (garbage  collection  algorithms  often  solve  this  problem;  see, 
for  example,  (3,  4,  30,  331). 

As  long  as  the  network  does  not  have  to  be  walked,  loops  and 
self-containment  do  not  present  a  problem.  The  only  other  traversal  of  the 
naming  network  is  for  resolving  a  name,  which  is  directed  by  the  name  to  be 
resolved.  If  a  name  hits  a  loop,  intentionally  or  unintentionally,  the 
results  may  be  unexpected,  but  the  system  will  not  incur  any  great  problem 
(like  an  infinite  loop),  because  the  name  must,  by  its  physical  properties, 
have  a  finite  length.  If  there  are  soft  links,  however,  name  resolution 
may  enter  an  infinite  loop  if  a  cycle  of  links  is  encountered  (l). 

III.B.4.  Storing  Data:  The  CONTENTS. 

The  purpose  of  Catoan,  and  of  any  filing  system,  is  to  allow  the  users 
of  a  computing  system  to  retain  data  for  long  periods  of  time.  For  this 
purpose,  Catoan  objects  have  a  component  called  the  "CONTENTS."  It  is  in 
the  CONTENTS  that  the  actual  data  are  stored. 

Most  filing  systems  are  "record"  oriented:  one  retrieves  ("reads")  and 
deposits  ("writes")  bit-  or  byte-strings,  or  some  collection  of  bits  or 


(l)  Multlcs  has  this  problem;  its  solution  is  to  abort  link  resolution 
after  encountering  some  number  of  consecutive  links. 
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bytes  ("records").  The  structure  of  the  data  is  very  visible  to  the  file 
system,  and  to  the  user  of  the  file  system.  Furthermore,  the  user  MUST 
know  the  structure  of  the  data  —  not  necessarily  how  it  is  stored 
physically,  but  usually  at  least  how  it  is  stored  logically  ("logical 
records") . 

The  CONTENTS  of  a  Catoan  object  is  of  arbitrary  type;  Catoan  has  no 
explicit  knowledge  of  the  structure  of  the  data  in  the  CONTENTS. 

Therefore,  the  CONTENTS  must  be  handled  in  its  entirety  through  a  pointer, 
rather  than  piecemeal  (as  in  many  other  filing  systems) .  Because  Catoan 
works  with  abstract  data,  users  of  Catoan  can  view  the  contents  abstractly, 
and  can  deposit  and  retrieve  arbitrary  data  structures.  There  is  no 
explicit  notion  of  records  in  Catoan. 

Because  a  pointer  to  the  data  in  the  CONTENTS  is  returned,  rather  than 
a  copy  of  the  data,  sharing  of  the  data  in  Catoan-obj ects  is  provided.  If 
one  wanted  to  implement  an  airline  reservation  system  with  many  agents 
accessing  a  shared  database,  the  database  could  be  stored  as  a 
Catoan-ob j ec t,  retrieved  from  Catoan,  and  then  manipulated  by  the 
operations  defined  on  the  database.  If  a  text  editor  were  implemented 
where  it  was  desired  to  operate  on  a  copy  of  the  original  data,  a  new 
object  containing  a  copy  of  the  data  In  the  CONTENTS  would  be  created, 
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operations  performed  on  the  copy,  and  then,  perhaps,  the  copy  stored  in 
place  of  the  old  CONTENTS. 

III.B.5.  Protection  and  Security. 

&n  Interesting  consequence  of  the  way  Catoan  stores  data  is  that  Catoan 
need  not  be  trusted  with  the  data.  True,  it  could  maliciously  delete  an 
object,  but  it  cannot  leak  parts  of  the  contents  of  the  object  to  other 
users.  Ml  Catoan  could  leak  would  be  the  entire  object.  If  one  wanted  to 
store  one's  data  securely,  so  that  no  one  else  could  read  it,  one  could 
store  it  as  the  CONTENTS  of  a  Catoan  object,  and  simply  not  give  anyone  an 
interface  to  the  module  that  implements  the  data  in  the  CONTENTS.  Ml 
Catoan  can  do  is  to  leak  the  entire  CONTENTS  of  the  object;  if  the 
interface  is  not  also  possessed,  the  CONTENTS  does  no  one  any  good. 

It  might  be  undesirable  to  let  even  the  CONTENTS  of  the  object  reach 
"unfriendly"  hands.  For  example,  it  might  be  necessary  for  someone  to  have 
an  interface  to  the  module  which  implements  the  object  stored  in  the 
CONTENTS,  and  yet  he  should  be  restricted  from  using  the  CONTENTS  of  a 
particular  Catoan-obj ect.  Such  protection  can  be  provided  through  various 
schemes,  ranging  from  passwords,  to  access  control  lists,  to  capabilities. 
Passwords  can  be  included  easily  in  Catoan,  by  adding  P\SSWORD_SET  and 
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PASSWORD  J/ERIFY  operations  to  the  module,  for  example.  This,  however, 
require  trusting  Catoan  to  properly  implement  password  protection. 

Similarly,  Catoan  could  implement  access  control  lists,  and  could 
verify  the  right  of  some  principal  to  perform  certain  operations  on  a  given 
object.  This,  again,  requires  trusting  Catoan  to  properly  enforce  the 
protection. 

If  one  does  not  want  to  have  to  trust  the  object  manager  with  his  data, 
what  can  be  done?  Capabilities  [28]  offer  a  solution.  If  someone  does  not 
have  the  capability  of  something,  that  thing  cannot  be  accessed,  because  it 
cannot  be  named.  This  level  of  protection  must  be  enforced  by  the  system's 
memory  manager. 

In  a  capability-based  system,  implementing  a  directory-walking 
mechanism  for  name  resolution,  where  all  resolution  begins  from  a  "root"  as 
in  a  Multics-like  file  system,  allows  all  users  access  to  all  objects.  To 
resolve  a  name,  start  at  the  root  (to  which  all  have  access);  find  the  name 
in  the  directory,  and  use  the  capability  there  found  to  proceed  to  the  next 
node,  where  the  process  is  repeated.  Since  names  are  "translated"  directly 
to  capabilities,  and  since  capabilities  are  the  mechanism  on  which 
protection  is  based,  naming  and  protection  become  equated.  Since  naming  Is 
universal  in  a  Multics-like  system,  there  is  no  protection. 
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What  Is  needed  Is  a  restriction  on  the  Initial  entry  Into  the  naming 
network.  Providing  a  single  node  (object)  from  which  all  other  nodes 
(objects)  can  be  reached  is  the  problem  In  the  Multlcs-llke  name  resolution 
In  a  capability  system.  Each  user  must  be  able  to  name,  to  find  In  some 
accessible  directory,  only  those  objects  which  should  be  accessible  to  him. 
This  requires  a  per-user  directory  of  objects  Initially  accessible  upon 
entry  to  the  system,  and  then  careful  control  of  which  capabilities  are 
given  to  which  users,  and  to  which  additional  objects,  besides  the  one 
directly  referenced,  access  Is  granted  (that  Is,  which  objects  are 
contained  In  the  DIRECTORY  of  the  directly  referenced  object).  This 
restriction  Is  a  general  property  of  capability-based  protection  systems. 

A.  more  detailed  description  of  the  issues  underlying  this  discussion  is  In 
[28]. 

Part  of  C^toan's  Job  Is  to  produce  internal  names  (like  capabilities) 
from  external  names  (like  character  strings).  This  Is  the  Job  of  the 
directory  manipulation  operations  of  an  object.  The  DIRECTORY  JLOOKUP 
operation  "translates"  a  character  string  Into  an  object,  thereby 
generating  an  Internal  name,  or  capability.  The  solution  here  comes  from  a 
refinement  to  the  basic  capability  mechanism,  and  requires  introducing 
"locked"  capabilities. 
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A  locked  capability  has,  la  addition  to  the  reference  to  an  object,  a 
"lock"  associated  with  it.  A  locked  capability  is  implemented  by  a  trusted 
module,  such  as  described  in  Section  III. A. 2.  In  order  to  access  the 
capability  protected  by  the  lock,  and  the  data  protected  by  the  locked 
capability,  an  accessor  must  go  through  the  proper  type  manager,  which  can 
verify  the  accessor's  identify  and  rights  in  whatever  way  its  implementor 
pleases.  In  other  words,  in  order  to  use  the  locked  capability,  a  "key" 
fitting  the  lock  must  be  presented. 

Capability  locks  and  keys,  like  capabilities  themselves,  must  be 
unforgeable  (locked  capabilities  must  also  be  unforgeable) .  Thus,  if  one 
wants  to  place  an  object  in  a  somewhat  publicly  available  directory  (as  may 
be  required,  because  all  directories  may  be  "somewhat  publicly  available"), 
and  yet  retain  control  over  who  can  access  the  object,  a  locked  capability, 
rather  than  an  ordinary  capability,  is  placed  in  the  directory.  The  key 
for  the  locked  capability  is  then  distributed  in  a  secure  manner  to  those 
who  are  allowed  access  to  the  object. 

III.C.  A  Refined  Object . 

The  basic  object,  described  above,  is  rather  spartan.  Often,  a  more 
"civilized"  object  is  desired  which  supplies  features  convenient  for  human 
use.  For  example,  one  might  want  to  provide  a  "classical  file"  object. 
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supporting  record-at-a-time  access.  Perhaps  more  security,  automatic 
object  cross-referencing,  locking  (or  some  other  form  of  "sequencing"),  or 
version  control  might  be  desired.  This  section  describes  a  more  refined, 
"civilized"  object  than  that  described  above. 

Itt.C.l.  Protection  and  Security. 

An  Important  refinement  to  the  basic  Catoan-ob j ect  Is  the  addition  of 
further  protection  features.  Given  the  directory  lookup  mechanism,  a 
capability-based  protection  system  may  provide  little  security,  as 
previously  noted  (Section  III.B.5).  A  solution  to  this  problem  Is  to 
provide  an  access  control  list  scheme  as  a  feature  of  a  refined  object. 

The  access  control  list  would  be  a  matching  of  principal  identifiers  with  a 
specification  describing  the  types  of  access  allowed  the  principal.  This 
is  the  scheme  Multics  uses,  and  is  described  in  [28], 

An  alternative  to  the  complete  access  control  list  is  a  Unix  [26]  or 
MPS/3000  (13]  protection  scheme,  which  allows  all  members  of  particular 
groups  the  same  access.  For  example,  all  members  of  a  particular  project, 
or  of  a  particular  sub-project,  might  be  given  access  to  the  object.  In 
Multics,  this  would  be  represented  as  "*.Syspal,"  where  "Syspal"  was  the 
name  of  the  project. 
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Farther  protection  refinements  can  also  be  implemented.  A 
security-clearance  concept  (confidential,  secret,  top  secret)  is  a 
possibility,  where  each  process  would  have  an  unforgeable  indication  of  Its 
current  clearance;  passwords  could  be  provided,  requiring  that  the  correct 
password  be  supplied  when  the  object  Is  accessed;  arbitrary  protection 
schemes,  requiring  access  only  between  certain  times,  or  on  certain  days, 
or  after  a  program  sufficiently  verifies  the  identity  of  the  user,  might  be 
desired.  By  making  the  various  "protected  objects"  each  a  different  type, 
with  different  object  managers,  and  allowing  access  only  through  the 
correct  manager,  access  to  the  objects  can  be  restricted  as  desired. 

A  point  of  note  is:  what  Is  being  protected  by  the  access  control  list 
of  the  refined  Catoan-object?  Catoan  is  not  necessarily  trusted; 
furthermore,  it  Ls  possible  to  access  the  data  in  the  CONTENTS  of  a 
Catoan-object  without  the  intervention  of  Catoan.  Therefore,  the  access 
control  list  cannot  protect  the  data  in  the  CONTENTS  in  the  general  case. 
Rather,  the  access  control  list  protects  the  Catoan-object,  since  that  is 
the  only  thing  for  which  access  requires  using  Catoan. 

How,  then,  might  the  data  in  the  CONTENTS  of  a  Catoan-object  be 
protected?  Locked  capabilities,  described  above,  offer  one  solution. 
Another  solution  Is  to  control  the  distribution  of  the  data's 
addressability.  In  a  capability-based  protection  system,  this  Lmplies  not 
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distributing  the  capability  for  the  data  to  other  users,  but  instead 
requiring  them  to  use  Catoan  to  access  the  data.  This  requires  the  user  to 
trust  Catoan  to  enforce  the  access  control  list. 


Figure  5  shows  the  operations  and  representation  of  an  access  control 
list  scheme.  The  access  control  list  is  implemented  as  an  array,  matching 
principal  identifiers  with  access  rights.  The  access  rights  are  specified 
by  bits  indicating  DIRECTORY  read,  DIRECTORY  search,  DIRECTORY  modify, 
DIRECTORY  append,  CONTENTS  read,  CONTENTS  set,  access  controL  list  read, 
access  control  list  modify,  and  access  control  list  append.  Each  operation 
protected  by  the  access  control  list  must  verify  that  the  principal 
requesting  the  operation  is  authorized  to  perform  the  operation  on  the 
object;  if  not,  UN AUTHORIZED_\CCESS  is  signaled. 


The  ACL_ADD_PRINCIPAL  operation  gives  a  new  principal  access  to  a 
Catoan-obj ec t.  The  arguments  are  the  identifier  of  the  principal  and  the 
access  specification.  If  the  specified  principal  is  already  in  the  access 
control  list,  an  exception  is  signaled.  _ADD_ACCESS  adds  a  the  specified 
access  rights  to  a  principal  in  the  Catoan-obj ec t' s  access  control  list. 


_DELETEJPRINCIPAL  rescinds  a  principal's  right  to  access  the 
Catoan-obj ec t.  Similarly,  _DELBTE_ACCESS  removes  a  particular  access  right 
of  a  principal  in  the  Catoan-obj ec t' s  access  control  list. 


-68- 


Catoan 


Object  Definition 
Refined  Object 


Chapter  Three 
Section  C 


acl_add_prlncipal :  PROCEDURE  (new__ac  l :acl ,  printprinclpal  Id) 

EXCEPriON(unauthorized_access ,  acl  principal  already  in_acl) 
(*  Inserts  new  principal  In  the  access  control  list.  *) ; 
ac l _del ete _pr incipal : PROCEDURE (pr in :pr Inc ipal_id) 

EXCEPTION (unauthorlzed_ac cess,  acl_principal_not_ln_acl) 

(*  Removes  a  principal  from  the  access  control  list.  *) ; 
acl  add  access: PROCEDURE(add  acltacl,  pr in:principal  id) 

"RETURNS ( o Id  _ac 1 : ac 1 ) 

EXCEPTION (unau t ho rized_ac cess,  acl_principal_not_in_acl) 

(*  Ensures  PRIN  has  specified  permission.  *) ; 
acl _delete_access : PROCEDURE(del_acl : acl ,  pr in : princ ipal _id) 
RETURNS  ( old  _ac  1 :  ac  1 ) 

EXC EPT ION ( unau thorized_acc ess ,  acl_princ ipal _not _in_acl) 

(*  Ensures  PRIN  does  not  have  specified  permission.  *) ; 
ac 1  _read : PROC  EDURE 

RETURNS  (ac i  :  accession trol  list _rep) 

EXC EPT ION ( unau tho ri zed _acc ess) 

(*  Formats  the  access  control  list  for  external  perusal.  *) ; 
acl_set :PROCEDURE(new_acl : access ^control_l is t_rep) 

RETURNS  (old  _acl :  accession  trol  _1 1st  _rep) 
EXCEPriON(unauthorized_access) 

(*  Allows  bulk  setting  of  the  access  control  list.  *) ; 

acceas_control_list_rep  *  ARRAY (*)  OF  RECORD 
pr in: princ ipal  id; 
the_acl :acl; 

END;  laccess  control  list  rep 
acl  -  RECORD 

dlr_acl:  ARRAY(4)  OF  BOOL,  (*Read,  Search,  Modify,  Append.*); 

cont _aci :  ARRAY(2)  OF  BOOL  (*Read,  Set.*); 

ac l _ac l :  ARRAY(3)  OF  BOOL  (*Read,  Modify,  Append.*); 

END;  !acl 

principal^id  =  string(20); 


Figure  5:  An  Access  Control  List  Scheme  for  Catoan. 

JREAD  returns  the  entire  access  control  list  so  that  it  can  be  examined 
externally.  This  operation  might  be  used  to  obtain  an  access  control  li3t 
for  use  in  setting  some  other  Catoan-object' s  access  control  list,  using 
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the  _SET  operation.  _SET's  argument  la  an  entire  access  control  list,  like 
the  value  returned  by  _READ. 

The  representation  of  an  access  control  list  consists  of  a  sequence  of 
two-component  RECORDS.  Each  RECORD  consists  of  a  PRINCIPAL  ID  and  an  ACL. 
The  ACL  is  a  three-component  RECORD:  the  DIR_ACL,  the  CONT_ACL,  and  the 
ACL_ACL.  Each  component  is  an  ARRAY  OF  BOOL,  with  the  several  bits 
corresponding  to  the  various  modes  of  permission  which  can  be  granted. 

Each  permission  type  is  Independent  of  all  the  others. 

III. C. 2.  Cross-Referencing. 

One  often  wants  to  determine  which  Catoan-ob j ects  reference  CRL,  and 
which  Catoan-ob j ects  CRL  references.  This  requires  two  collections  of 
data:  those  objects  referenced  by  CRL,  and  those  objects  which  reference 
CRL.  The  first  set  is  the  DIRECTORY  of  CRL,  and  so  is  readily  available. 
The  second  set,  however,  is  not  so  readily  available  — -  it  must  be 
explicitly  collected. 

How  might  such  a  cross-reference  be  implemented?  Suppose  each  object 
had  a  structure  and  operations  like  those  of  Figure  6  as  part  of  its 
definition.  Then,  upon  adding  a  reference  to  an  object's  DIRECTORY,  a  call 
to  the  contained  object's  XREF_ADD_REF  operation  would  be  included  in  the 
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Implementation  of  DIRECTORY _NAMED_ADD  and  DIRECTORY JJNNAMEDJIDD.  (Sira Liar 
definitions  and  calls  are  required  for  XREF_DELBTE  REF.) 

There  is  a  problem  with  the  above  method  for  storing  cross-reference 
information:  who  pays  for  the  storage?  A  straight-forward  implementation 
of  a  versioned-ob j ect  would  have  the  object's  owner  paying  for  the  storage 
of  cross-reference  information.  This  penalizes  owners  of  very  popular 
objects,  for  the  object's  owner  may  have  little  control  over  the  number  of 
referencing  objects. 

One  solution  is  to  Ignore  the  problem;  that  is,  to  let  the  object's 
owner  pay  for  the  object's  cross-reference  information.  Another 
possibility  is  for  the  accounting  system  to  keep  track  of  the  number  of 
cross-references  to  each  object,  and  to  deduct  the  charges  for  the 
cross-reference  information  from  the  object  owner's  bill.  This  would 
effectively  make  cross-reference  information  part  of  the  system's  overhead, 
and  so  all  users  would  pay  a  share  of  the  cross-reference  storage  costs. 
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xref  add  ref  :PROCEDURE(natne:obJ  ect_name,  ob  j  :@obj ect) 
EXCEPT ION (xref _full> ; 

(*  NAME  is  the  name  of  the  referencing  object. 

OBJ  la  a  reference  to  the  referencing  object.  *) 

array _re f _out _o f _bo  und a :  EXCEPT ION ; 

EXCEPTION 

ON  array_ref _out_of _bounds  DO 
RETURN (xref _fullf; 

BEGIN 

SELF. xref. refd_bys[SELF. xref. nextl  .obj  :■  ob]; 
END; 

END; 

SELF.xref .refdJ>ys[SELF. xref .nextl .name  :»  name; 

SELF,  xref  .next  +1; 

END  PROCEDURE;  !xref 

xref:  RECORD 

next:  INTEGER; 

refd_bys:  ARR4Y(*)  OF  RECORD 
name:  object_name; 
obj:  ^object; 

END;  !refd_bys 
END;  !xref 


Figure  6:  Additions  to  the  Baaic  Object  for  Croas-Referencing. 


Two  problem3  exist  with  the  system  overhead  solution.  The  first  la 
that  it  is  Inequitable:  if  a  system  has  two  users,  with  the  first 
referencing  five  objects  not  owned  by  him  and  the  second  referencing  one 
such  object,  both  users  would  probably  pay  for  three  references,  thereby 
overcharging  the  second  user.  The  second  problem  is:  what  prevents  someone 
from  informing  the  system  of  far  more  references  than  actually  exist  to  his 
objects  and  (illegally)  lowering  his  storage  bills? 
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Assume  that  the  following  fragment  Is  part  of  the  XR EF _ADD _REF 
operation: 

CALL  acc tg_storage_add _ref ( arguments) 

The  ACCTG_STORAGE_\DD_REF  operation  tells  the  accounting  manager  that  a 
cross-reference  entry  has  been  added  to  a  particular  object,  and  that  the 
object's  owner  should  not  be  charged  for  the  storage  occupied  by  the  entry. 
This  operation  must  be  carefully  protected;  the  only  entitles  which  are 
allowed  to  call  ACCTG_STORAGE_ADD_REF  must  be  trusted  by  the  accounting 
manager  not  to  call  it  excessively  (that  Is,  more  times  than  are 
appropriate  for  the  number  of  references) ,  because  otherwise  someone  could 
obtain  free  storage. 

HI. 0.  A  Versioned  Object. 

Another  refinement  to  the  basic  object  Is  the  "versioned"  object. 

Rather  than  directly  modifying  an  object  when  changing  It,  a  new  Instance 
of  the  object  is  created,  which  Is  somehow  related  to  a  previous  Instance. 
Therefore,  rather  than  an  object  appearing  mutable.  It  is  a  "history"  of 
Immutable  versions.  This  provides  access  to  instances  of  the  object 
besides  the  most  recent  one,  and  facilitates,  for  example,  concurrent 
support  and  development  of  software. 
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IlI.D.l.  Version  Naming. 

For  each  object,  a  hierarchy  of  versions  exists,  which  Is  reflected  in 
each  version's  name.  The  hierarchical  relationship  Is  that  of  "logical 
derivation":  If  Version  B  is  the  child  of  Version  A,  then  B  was  "logically 

derived  from"  A.  For  example,  B  might  be  a  refinement  of  A,  correcting  an 
Implementation  error  If  the  object  were  a  program.  Alternatively,  B  might 
become  a  sibling-version  to  A,  which  could  imply  that  A  and  B  were  similar 
sorts  of  refinements  (Improvements,  modifications)  of  their  mutual  parent. 
Whether  a  version  Is  a  child  or  a  sibling  Is  the  decision  of  the  version's 
author. 

A  version  name  consists  of  a  sequence  of  qualifiers  to  the  object  name. 
These  qualifiers  are  suffixes  to  the  object  name  or  to  a  "qualified"  object 
name  (an  object  name  with  a  version  name  suffix).  Each  qualifier  Is  a 
number,  specifying  the  version  number  from  the  appropriate  level  In  the 
version  hierarchy  which  Is  desired.  The  name  "CRL.3.62"  Is  a  qualified 
object  name,  whose  object  name  is  "CRL,"  and  whose  version  name  Is  "3.62." 

Figure  7  shows  an  sample  version  hierarchy.  The  versioned 
Catoan-ob j ec t  Is  named  "CRL."  CRL  has  three  "top-level"  versions;  that 
Is,  three  versions  which  are,  In  some  sense,  major  modifications  of  CRL. 

In  system  Installation  terms,  this  level  In  the  tree  might  correspond  to  a 
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"release,"  with  lower  levels  being  called  "level"  and  "fix."  To  obtain  CRL 
release  two,  one  would  use  the  name  "CRL. 2";  to  obtain  CRL  release  three, 
level  one,  "CRL. 3.1"  would  be  used. 


Figure  7:  Version  Naming  Hierarchy. 


Examining  the  CRL. 3  subtree,  there  are  two  children  of  CRL. 3:  CRL. 3. 2 
has  no  children;  CRL. 3. 1  has  three  children.  In  system  installation 
terms,  one  might  reference  CRL  release  three  level  one  fix  two  as 
"CRL.  3. 1.2." 


There  are  no  restrictions  on  the  semantics  attached  to  the  various 
levels  In  the  hierarchy.  For  example,  rather  than  "system  Installation," 
version  management  could  be  used  In  a  class  on  software  engineering. 
Suppose  an  exercise  In  modifying  existing  programs  Is  to  be  given.  The 
students  might  be  broken  Into  groups,  with  each  group  developing  a 
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solution.  The  initial  program  Is  CRL;  each  group  Is  to  create  Its 
solution  as  CRL.n.  While  working  on  Che  assignment,  various  trial 
solutions  might  be  attempted,  with  modifications  being  made  In  an  attempt 
to  produce  a  better  solution.  Perhaps  one  group  has  one  small  part  of  the 
problem  remaining  which  Is  especially  difficult,  and  so  two  of  the  group 
members  attempt  a  solution  In  parallel.  All  of  this  could  be  handled  very 
easily  with  the  version  maintenance  system  proposed  in  this  chapter. 

III. D. 2.  Storing  and  Implementing  Versions. 

Storing  versions  is  a  problem  distinct  from  naming,  though  they  are 
often  coupled,  especially  if  versions  are  stored  as  Incremental  changes  to 
other  versions,  as  in  the  Source  Code  Control  System  ( SCC3)  [5,  11,  27] 
available  with  the  Programmer's  WorkBench  under  Unix  [8,  18].  SCCS  stores 
a  set  of  versions  as  a  collection  of  updates  run  against  the  parent 
version.  A  version  Is  created  from  some  particular  existing  version,  Is 
named  relative  to  that  version,  and  Is  generated  from  that  version.  (The 
version  generation  process  Is  recursive  if  necessary.) 


By  de-coupllng  version  naming  from  version  generation,  additional 
flexibility  Is  obtained,  without  sacrificing  the  potential  benefits  of 
coupling  naming  and  generation  (coupling  can  be  done  by  the  user  If 
desired).  Furthermore,  the  proposed  mechanisms  allow  version  generation  to 
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be  done  In  any  manner  desired,  allowing  the  user  to  specify  space-time 
: *  * le-o ffs,  derivation  relationships,  policies  for  creating  new  versions 
(as  opposed  to  Including  the  changes  In  an  existing  version  for 
efficiency),  et  ceterae. 

The  additional  Information  contained  In  a  Catoan  versioned  object  to 
provide  version  maintenance  and  the  operations  on  such  objects  are  shown  In 
Figure  8. 

k  versioned  Catoan-ob j ec t  consists  of  four  types  of  information: 
information  describing  how  to  generate  the  version  ( VERS ION  _GEN  _INF0) ,  the 
logical  children  of  the  node  in  the  version  naming  hierarchy  (CHILDREN), 
the  logical  parent  of  the  node  (P4RENT) ,  and  whether  some  other  version  is 
physically  derived  from  this  version. 
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ve rs Ion  _new:  PROC EDURE (v_name:  ver slon_name, 

v_base:  ^versioned  catoan  object, 
update_lnfo:  updates_specIflcatton, 
v_gener:  ver sion_generatlng  program) 

RETURNS (new_vers Ion:  @versloned_catoan  object) 

EXCEPT ION (vers lon_ex Is ts) 

(*  Creates  a  new  version;  SELF  *  parent.  *) ; 
version  delete:PROCEDURE(v_name:verslon  name) 

EXCEPT ION ( ver slon_nonex Is tent) 

(*  Remove  a  version  from  the  history;  SELF  »  parent.  *) ; 

vers lon_get : PROCEDURE ( v_name : ver  s lon_name) 

RETURNS(v_obj  : @vers toned _catoan_ob j ect) 

E XCEPT ( vers Ion _nonex is tent) 

(*  Translate  a  name  into  a  versioned  object;  SELF  -  parent.  *) 
vers ionread : PROC EDURE 

RETURNS  (o^catoanjDbj  ect) 

(*  Translates  a  version  into  an  object;  SELF  -  the  version.  *) 

ver s lon_rep l ac  a : PROC  EDURE ( v_name :  ver  s Ion  name , 

v_base:  @  ver  s ioned _c a t o an _ob j ec t , 
update_lnfo:updates_spec If teat Ion, 
v_gener :verslon_generatlng_program) 
EXCEPTION(verslon_nonexlstent,  vetslon_not_ replaceable) 

(*  Replace  a  (leaf)  version  with  a  new  one.  *) ; 

addltlonal_verslonlng_lnforraatlon(updates_speciflcatlon)  »  TYPE  RECORD 
version  gen_lnfo:  RECORD 

base  version:  ?ver sloned_catoan  object; 

updates:  updates_spec If Icatlon; 

ver s lon_gen_pgra :  ?ver slon_generatlng  procedure; 

END;  Iverslon  gen  info 
children:  \RR*Y(*)  OF  RECORD 
name:  verslon_name; 
version:  ^versloned^atoanjsb  j  ect; 

END;  ichtldren 
parent:  RECORD 

name:  verslon_name; 

version:  @versloned_catoan_obj  ect; 

END;  1  parent 
used_as_base:  BOOL; 

END;  !verstoned_catoan_ob j ect 
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verslon_name( size :  0  TO  100)  *  TYPE  RECORD 
length:  0  TO  size; 
chars:  ARRAY ( l  TO  size)  OF 
UN IOH ("0"  TO  "9",  "."); 

END;  ! version  jiame; 


Figure  8:  Additional  Information  and  Operations  for 
Version  Maintenance. 


The  VERSION_GEN_INFO  contains  three  pieces  of  Information.  The 
BASE_VERSION  denotes  the  version  from  which  the  current  version  is 
physically  derived.  To  generate  the  current  version,  as  is  done  by 
VERSION_READ,  start  with  the  BASE_VERSION  and  apply  the  UPDATES.  The 
UPDATES  specify  the  transformation  under  which  the  base  version  must  go  to 
obtain  the  current  version.  The  UPDATES  are  applied  by  the 
VERS ION_GENERATlNG_PROCEDURE,  In  which  the  semantics  of  the  UPDATES  are 
embodied.  The  minimal  definition  of  VERSION_GENERATlNG_PROCEDUREs  Is  shown 
In  Figure  9. 

The  definition  of  the  UPDATES ^SPECIFICATION  (the  parameter  to  the 
VERSIONED _CATOAN_OBJECT  type)  Is  up  to  the  user,  as  is  that  of  the 
VERSION _GENE  RATING _PROC EDURE.  The  only  requirements  of  either  of  these  Is 
that  the  VERSION_GENERATlNG_PROCEDURE  meets  the  proper  interface,  and  the 
VERSION  GENERATING  PROCEDURE  and  UPDATES  SPECIFICATION  are  compatible. 
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version ^generating  procedure( updates  specification,  contents  type)  - 
TYPE 

PROCEDURE(base :  verslon_name,  updates:  updates_speclflcatlon) 
RETURNS (contents  _type) 

EXCEPTION(versioned_ob j  ect_nonexlstent _base, 

versioned_ob j ect_inconslstent  updates) 

(*  UPDATES J5PEC IF IC ATION  Is  a  type  definition  describing  the 
form  of  the  updates. 

CONTENTS _TYPE  describes  the  form  of  the  CONTENTS  of  the 
version. 

BASE  Is  the  version  from  which  this  version  Is  physically 
derived. 

UPDATES  is  the  updates  to  be  run  against  the  base.  *) ; 


Figure  9:  Definition  of  VERSION  J3ENERATING_PR0CEDUREs. 


VERSION_REPLACE  allows  certain  versions  to  be  mutable,  rather  than 
immutable,  so  that  changes  to  certain  versions  need  not  create  a  new 
version  (though  one  could  be  made,  if  desired).  Any  version  with  a  child 
becomes  immutable,  and  any  version  which  is  the  BASEVERSION  of  some  other 
version  also  becomes  Immutable.  However,  If  a  version  Is  a  leaf  In  tl.e 
naming  structure,  and  no  other  versions  depend  upon  It,  It  can  be  changed. 
Thl3  Is  an  efficiency  refinement,  and  allows  small  changes  to  be  readily 
Incorporated. 

The  VERSION  _DELSTE  operation  Is  not  totally  straightforward;  It  cannot 
merely  remove  the  version.  Some  other  version  may  be  using  the 
to-be-deleted  version  as  Its  BASE_VERSION.  If  deleting  a  version  will 
remove  a  BASE  VERSION,  either  the  version  cannot  be  deleted,  or  the 
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information  in  it  must  be  included  in  those  versions  which  depend  on  the 
version  to  be  deleted.  This  may  require  a  cross-referencing  mechanism, 
simitar  to  that  presented  in  Section  III.C.2. 

The  CHILDREN  field  specifies  those  versions  which  are  immediate 
children  of  the  current  version.  The  CHILDREN  fields  of  all  the  versions 
of  a  versioned  object  specify  the  logical  relationships  among  the  various 
versions,  as  described  above.  Because  the  CHILDREN  information  and  the 
VERS ION _GEN  INFO  information  are  separate,  the  logical  derivation  of  a 
version  need  not  be  related  to  the  physical  derivation  of  the  version. 

The  CHILDREN  field  attaches  names  to  the  logically  derived  children  of 
the  current  version.  The  name  of  the  child,  together  with  the  names  of  all 
the  eventual  parents  of  the  child,  specify  the  position  of  the  child  in  the 
version  hierarchy.  See  Section  III.D. 1  for  a  discussion  of  version  naming. 

The  PARENT  information  Indicates  the  version  which  is  the  logical 
parent  of  this  version.  It  allows  tracing  back  up  the  version  hierarchy 
when  necessary. 

To  demonstrate  how  a  VERSION  GENER\TlNG_PROCEDURE  and  an 
UPDATES  SPECIFICATION  might  be  defined,  consider  an  example:  maintaining 
versions  of  a  program.  The  version  history  Is  that  of  Figure  7. 
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Logically,  the  UPDATES_SPECIFICATION  could  be  a  collection  of  "commands," 
specifying  operations  like  "delete"  or  "insert"  on  a  particular  line  of  the 
document.  (This  is  similar  to  the  record-oriented  update  programs  which 
exist  in  some  batch-oriented  computing  systems  for  including  updates  in  the 
source  for  a  program.) 

The  VERS ION J3ENERATI  NG_PROC EDURE  would  take  the  BASE_VERSION  and  "run 
the  UPDATES  against"  the  base.  The  result  of  this  process  is  the  text  of 
the  version  represented  by  the  BASE_VERSION  and  the  UPDATES.  Each  pair  of 
<UPDATES,  BASE_VERSION>  could  represent  a  different  logical  version  of  the 
document  (l),  depending  on  how  the  VERS ION  _GENERATING_PROC EDURE  interpreted 
the  UPDATES  relative  Xo  the  BASE_VERSION. 

How  does  one  create  the  initial  version  of  such  a  program?  First,  a 
VERSIONED_CATOAN_OB JECT,  CRL,  is  created.  The  VERS ION _GEN_PGM  is  specified 
to  be  the  "Syspal_verslon_editor,"  which  would  apply  the  change  directives 
properly.  The  BASE_VERSION  is  specified  as  NULL,  indicating  that  there  is 
no  version  on  which  this  one  is  based.  Then,  the  UPDATES  which  will  create 
the  initial  version  of  CRL  from  "nothing"  are  supplied.  CRL's  initial 
version  is  now  complete. 


(1)  In  general,  only  a  small  subset  of  the  <UPDATES,  BASE  VERSION>s 
actually  represent  meaningful  versions. 
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As  an  example,  suppose  that  CRL. 3. 1.3  Is  to  be  created  under  CRL. 3. I 
(chat  Is,  CRL. 3. I  Is  to  be  CRL.  3.1.3's  parent).  For  whatever  reason, 

CRL. 3. 1.3  will  be  derived  from  CRL. 3. 2.  What  follows  Is  a  description  of 
generating  CRL. 3.  1.3  at  the  lowest  level. 

Call  the  version  to  be  created  NEW_VER,  and  let  ADAM  denote  the  most 
ancient  ancestor  In  the  version  tree  (In  this  case,  CRL).  First,  the 
version  on  which  NEW_VER  19  based  must  be  obtained.  The  statement 
base  :»  version  get(adam,  base  name); 

fines  the  version  denoted  by  BASE_NAME  (which  would  have  the  value  ".3.2") 
and  assigns  It  to  BASE.  The  program  of  CRL. 3. 2  would  be  obtained  by 

orlginal_pgm  :»  verslon_read(base) ; 

This  program  would  be  provided  as  Input  to  an  editor,  the  output  of  which 
would  be  the  new  version  of  the  program's  source,  which  would  be  assigned 
to  NEWPGM.  The  Incremental  differences  between  ORIGINALJPGM  and  NEW_PGM 
could  be  determined  by 

differences  :»  Syspal_dtf ferences(original _pgm,  new_pgm) ; 
and  everything  Is  almost  ready  to  complete  the  process.  The  parent  of 
NEW_VER  mu9t  be  obtained: 

parent  verslon_get(adam,  parent_name) ; 
assuming  ".3.1"  is  the  value  of  PARENT _NAME.  Now,  NEW_VER  can  be  Included 
in  the  version  hierarchy  of  CRL,  using  the  statement 
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naw_ver  :■  ver s lon_new( parent,  newjnatne,  base, 

differences,  Syspal_version_edltor) ; 

where  NEW_N4ME  has  ".2"  as  its  value.  This  completes  the  creation  of 

CRL.3. 1.3. 


To  obtain  the  program  as  of  a  particular  version,  the  version's  name  is 
supplied  to  VERSI0NJ3ET,  which  finds  the  version  in  the  version  naming 
hierarchy.  VERSION  _READ  is  then  invoked,  which  passes  the  version's  base 
and  updates  to  the  version  generator  (VERSION_GEN_PGM) ,  which  returns  the 
version. 

At  some  point,  after  the  version  history  becomes  very  large,  generating 
a  given  version  may  take  a  very  long  time.  What  could  then  be  done  is  to 
create  a  version  which  is  complete  (similar  to  the  lnltLal  version). 
Thereafter,  future  new  versions  could  be  generated  off  this  new  "complete" 
version,  rather  than  having  to  incrementally  generate  all  the  previous 
versions  before  generating  the  desired  one. 

III. D. 3.  More  on  Version  Naming. 

In  addition  to  the  regular  version  names,  one  might  want  to  have 
"sliding"  names  for  versions.  For  example,  when  developing  a  program,  one 
oftan  has  a  backup,  a  current,  and  a  test  version  of  the  program.  Upoa 
determining  that  the  test  version  is  ready  for  installation,  one  would  want 
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to  change  the  meanings  of  the  names  "backup,"  "current,"  and  "test"  to 
reflect  the  new  state.  This  can  be  accomplished,  and  the  general  problem 
of  "sliding"  names  can  be  solved,  by  introducing  "variables"  to  reference 
versions. 

A  simple  method  of  specifying  variables  for  version  references  Is  to 
include  an  optional  user-defined  procedure  for  variable  assignment  which 
would  be  called  whenever  a  new  version  is  created.  This  procedure,  or 
another  one,  could  also  be  called  directly  by  the  user  when  he  wanted  to 
update  the  variable  assignments.  The  variables'  names  and  the  objects  they 
referenced  could  be  stored  in  the  named  DIRECTORY  in  the  highest-level 
Catoan-object. 

It  may  be  desirable  to  allow  a  general  network  of  version  names,  rather 
than  Just  a  hierarchy.  Catoan  supports  a  general  network  for  naming 
objects;  version  naming  may  require  similar  capabilities.  At  this  point, 
the  value  of  a  version  network  has  not  been  proven.  Despite  always 
referring  to  a  hierarchy  of  naming  versions,  though,  Catoan  will  support  a 
network  of  versions  using  the  definition  presented  in  Figure  8  above.  Any 
restrictions  to  a  hierarchy  would  have  to  be  done  In  the  VERSION_NEW 
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The  operations  presented  here  are  very  low  level.  Presumably,  a 
higher-level  Interface  to  version  maintenance  would  be  presented  to  the 
user  by,  for  example,  the  editor. 


ill.E.  Summary 


Definitions  of  the  "Basic  Catoan-Object,"  a  "refined"  object,  and  a 
"versioned"  object  have  been  presented  in  this  chapter.  The  operations  of 
the  objects,  and  sample  representations,  have  been  described.  Issues  of 
naming,  protection,  and  (in  some  cases)  efficiency  were  mentioned. 


✓ 
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AN  EXAMPLE:  A  SYSPAL  PROGRAM  OBJECT 

la  this  chapter,  I  shall  demonstrate  how  Catoan  might  be  used.  The 
demonstration  will  be  based  on  an  example:  a  "Syspal  program  object."  A 
Syspal  program  object  Is  a  convenient  way  to  store  a  program  written  In 
Syspal  using  Catoan  as  the  object  storage  mechanism. 

In  this  object,  one  would  store  a  Syspal  program  though  the  same 
general  structure,  If  not  the  exact  structure,  could  be  used  for  storing 
programs  written  In  most  languages.  The  Syspal  program  object  Is  an 
extension  of  the  versioned  Catoan  object  described  In  Section  III.D,  and 
the  cross-referenced  Catoan  object  described  In  Section  III.C.2.  In 
addition  to  the  operations  pertaining  to  Syspal  programs,  the  operations  of 
the  versioned  Catoan  object  and  those  for  cross-referencing  are  part  of  the 
definition  of  the  Syspal  program  object. 
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IV. A.  Motivation. 

Classically,  a  program  is  stored  as  a  collection  of  files,  each  one 
containing  soma  portion  of  the  program.  For  example,  one  might  have  a 
source  file,  a  documentation  file,  an  object-code  file,  an  interface  file, 
a  load-able  (executable-code)  file,  and  so  on.  These  are  usually 
differentiated  by  a  suffix  indicating  the  kind  of  file:  ALG63  for  an 
ALGOL68  source  file,  PLl  for  a  PL/l  source  file,  DOC  for  a  documentation 
file,  OBJ  for  an  object-code  file,  et  ceterae.  Each  file  is  individually 
visible  to  the  user. 

A  typical  scenario  in  a  system  like  this  is  as  follows.  A  user  wants 
to  write  a  program  to  help  him  balance  his  checkbook.  Assume  he  wants  to 
use  the  Syspal  programming  language.  He  types  something  like 

edit  CheckBook  Syspal  new 

meaning  that  a  new  file,  of  "type"  Syspal,  named  "CheckBook,"  is  to  be 
edited.  Upon  finishing  his  first  attempts  at  writing  the  program,  he  might 
type 

run  CheckBook 

with  a  resultant  error  message  like 

NO  SUCH  FILE:  CheckBook. LOAD 

which  is  reported  because  he  had  never  compiled  the  program.  Upon 
discovering  his  error,  a  likely  follow-up  might  be 
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complLe  CheckBook 

for  which  another  error  message  might  be  generated,  because  there  Is  no 
COMPILE  command.  Finally,  after  much  aggravation,  the  user  might  realize 
that  he  should  type 

Syspal  CheckBook 
which  would  compile  his  program. 


Thinking  that  he  can  now  run  his  program  (assuming  It  compiled 
properly) ,  the  example  user  might  type 

run  CheckBook 

for  which  an  error  message  like  the  one  he  received  the  last  time  he  tried 
RUN  would  be  elicited.  Eventually,  he  might  realize  that 

link  CheckBook 


Is  needed,  after  which 

run  CheckBook 

would  work  —  assuming  that  SYSPAL,  LINK,  and  RUN  did  not  require  the  user 
to  supply  the  proper  suffixes  for  CheckBook. 

How  many  times  does  the  user  actually  care  about  the  object-code  file, 
or  the  load-able  file?  How  many  times  does  the  user  actually  care  about 
compiling,  or  about  linking  (except  to  check  for  compile-time  errors)?  Why 
can't  RUN  simply  produce  a  properly  executable  form  of  the  program? 
Abstractly,  the  user  Is  writing  a  Syspal  program,  not  a  machine-language 


-39- 


Catoan  Syapal  Program 

Motivation 

program;  what  does  he  care  about  the  representation  of  his  program? 

(Indeed,  even  if  he  were  writing  a  machine- language  program,  the 
representation  of  the  program  may  of  no  concern  to  him.) 

The  example  presented  in  this  chapter  addresses  these  problems.  The 
Syspal  program  object  defined  in  the  next  section  consists  of  several 
Internal  parts,  which  correspond  to  the  classical  object-code,  load-able, 
documentation,  source,  et  ceterae  files.  Normally,  these  are  of  no  concern 
of  the  user,  and  so  need  not  be  dealt  with  explicitly  (though  the  ability 
to  do  so  exists) . 
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IV. B.  Definition. 

Like  any  abstract  object,  a  Syspal  program  object  is  defined  by  the 
operations  one  performs  on  it.  The  primary  operations  one  performs  on  such 
objects  are  MEW,  DELETE,  EDIT,  RUM,  EDIT _DOCUMENTATION,  and  DEBUG.  Secondary 
operations,  which  exist  more  for  efficiency  than  for  completeness,  Include 
COMPILE,  and  RESOLVE JREFERENCES.  In  addition  to  those  operations  specific  to 
Syspal  programs,  the  operations  of  the  versioned  Catoan-ob j ect  and  the 
cross-referenced  Catoan-ob j ect  are  part  of  the  definition  of  the  Syspal 
program  object.  These  extra  operations  are  available  directly  to  the  user 
because  of  tha  ?VISI8LY_EXTEND  statement.  Figure  10  shows  the  Interface  for 
and  representation  of  the  Syspal-program  object. 
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new: PROCEDURE 

RETURNS ( p : @Syspal _prog ram) 

(*  Instantiates  a  new  Syspal  program.  *) ; 
delete: PROCEDURE 

(*  Destroys  a  Syspal  program  and  its  subsidiary  objects.  *) ; 
edit : PROCEDURE 

(*  Allows  modification  to  a  Syspal  program.  *) ; 
run: PROCEDURE 

(*  Executes  the  Syspal  program.  *) ; 
edit_documen  tat  ion  PROCEDURE 

EXCEPT(syspal_program  no  documentation) 

(*  Modifies  the  documentation  of  a  Sy3pal  program.  *) ; 

complLe: PROCEDURE 

EXCEPT (syspal _prograra_compi la tion_f ailed) 

(*  Compiles  the  Syspal  program.  *) ; 
resolve_references: PROCEDURE 

EXCEPT (syspal _program_unresolveable_reference) 

(*  Resolves  external  references  (calls  the  system  LINKER).  *) ; 

deb  ug : PROCEDURE 

(*  Invokes  the  DEBUGGING  subsystem.  *) ; 

%VISIBLY_EXTEND  vers  toned _catoan_ob j ect , 

cross _referenced_catoan_object ; 

SELF:  RECORD 

program:  vers loned_catoan_obj ect; 
xref:  cross_reference  information; 

(*  Use  of  the  VERS IONED_CATOAN_OB  JECT : 

CONTENTS  =*  source  code. 

unnamed  DIRECTORY  slot  I  -  object  code. 

unnamed  DIRECTORY  slot  2  -  documentation. 

unnamed  DIRECTORY  slot  3  -  interface. 

unnamed  DIRECTORY  slot  4  -  object  code  with  external 

references  resolved. 

named  DIRECTORY  slots  -  sub-programs.  *) 

END;  ! SELF 


Figure  10:  4  Syspal-Prograra  Object. 
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The  NEW  operation  is  Invoked  when  a  Syspal-prograra  object  is  created. 

It  takes  no  arguments,  and  returns  as  a  result  the  new  object.  Usually, 
this  operation  is  automatically  invoked  by  the  EDIT  operation  on  a  new 
program.  NEW  initializes  the  various  fields  in  the  representation  of  the 
program  before  returning. 

DELETE  destroys  a  Syspal-program,  and  all  of  its  underlying  sub-objects 
and  versions. 

The  EDIT  operation  is  Invoked  when  changes  are  to  be  made  to  the 
program.  \s  mentioned  above,  EDIT  will  invoke  NEW  if  a  new  program  is 
being  edited.  The  only  argument  of  the  operation  is  the  implicitly 
supplied  program  object;  it  returns  nothing. 


RUN  attempts  to  execute  some  representation  of  the  program.  For  Syspal 
programs,  this  may  require  compiling  first.  RUN  verifies  that  valid, 
current  executable  code  exists  for  the  source;  if  it  does  not,  RUN  will 
Implicitly  invoke  the  COMPILE  operation.  If  the  supporting  system  requires 
pre-execution  binding  (linking),  RUN  will  also  invoke  the 
RESOLVE  ^REFERENCES  operation.  Once  current  executable  code  is  obtained, 

RUN  will  transfer  execution-control  to  the  program. 
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EDlT_DOCUMENT\TION  provides  access  to  the  DOCUMENTATION  portion  of  the 
Syspal  program. 

DEBUG  calls  a  debugging  facility,  allowing  the  programmer  to  control 
the  execution  of  the  program,  to  examine  the  state  of  its  execution,  et 
ceterae. 

The  secondary  operations,  COMPILE  and  RESOLVE  _REFERENCES  produce 
object-  and  bound-code,  respectively.  As  mentioned,  they  exist  primarily 
for  efficiency.  They  would  probably  be  used  by  a  programmer  to  be  sure 
that  an  error  would  not  occur  if  someone  else  should  cause  the  operations 
to  be  implicitly  invoked. 

In  addition  to  the  explicitly  defined  operations  listed  above,  the 
operations  of  version  management  and  cross-referencing,  as  well  as  those  of 
the  basic  Catoan-object,  are  available  for  use  with  Syspal  program  objects. 
The  %VISIBLY_EXTEND  pseudo-statement  causes  the  named  interfaces  to  be 
included  in  this  one.  (Appendix  A  describes  this  in  a  little  more  detail). 
Syspal  programmers  can  treat  Syspal  program  objects  as  ordinary 
Catoan-ob j ect3,  including  them  in  other  Catoan-ob jects,  including  other 
Catoan-ob j ects  in  them,  explicitly  creating  new  versions,  accessing  the 
cross-reference  information,  et  ceterae. 
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For  example,  assume  that  a  user  named  "Rlbak"  was  writing  a  system 
composed  of  several  Syspal  programs.  One  of  the  programs  (called  "DRIVER") 
Is  the  top-level  program,  which  controls  dispatching  the  other  parts  of  the 
subsystem.  One  way  to  reflect  this  structure  In  the  external  structure  of 
the  programs  Is  to  have  the  other  parts  of  the  subsystem  be  sub-objects  of 
DRIVER,  Included  In  the  DIRECTORY  of  the  Catoan-obj ect  used  to  store  the 
DRIVER  Syspal  program  object.  Then,  Rlbak  could  easily  see  the  system's 
structure  by  NXMED JlEADing  the  DIRECTORY  of  the  Catoan-obj ect. 

IV. C.  Use. 

To  use  the  Syspal  program  object,  a  user  would  invoke  the  EDIT 
operation.  EDIT  would  obtain  the  source  code  of  the  program,  or  Initialize 
It  to  empty  If  the  program  was  new.  The  user  vrould  make  whatever  changes 
had  to  be  made,  replace  the  old  edition  of  the  program  with  the  updated  one 
(or,  perhaps,  create  a  new  version  instead),  and  terminate  the  editing 
session. 

If  the  editor  was  able  to  check  some  or  all  of  the  syntax  and  semantics 
of  the  program,  a  COMPILE  merely  to  verify  that  no  compilation  errors 
existed  would  be  unnecessary.  If  the  editor  was  unable  to  perform  such 
checks,  the  user  might  want  explicitly  to  COMPILE  the  program  if  he  was  not 
going  to  cun  It  Immediately,  and  someone  else  might  try  to  RUN  it  before  he 
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had  a  chance  to  do  so.  Otherwise,  he  could  Invoke  the  RUN  operation,  which 
would  automatically  Invoke  COMPILE  and.  If  necessary,  RESOLVE _REFERENCES. 

If  an  error  Is  discovered  while  RUNnlng  the  program,  the  DEBUG 
operation  could  be  Invoked,  allowing  the  programmer  to  examine  the  program 
and  Its  environment.  If  changes  were  made  to  the  program  while  debugging, 
EDIT  could  be  called  directly  by  DEBUG,  thereby  automatically  Incorporating 
changes  which  were  made  while  DEBUGglng  Into  the  permanent  copy  of  the 
program. 

Assume  that  the  programmer  finishes  DEBUGglng  the  program,  and  then 
neglects  to  COMPILE  the  program.  One  of  the  users  of  the  program  then 
tries  to  RUN  the  program.  At  this  point,  the  COMPILE  operation  Is 
Implicitly  Invoked,  and  the  program  Is  transformed  Into  some  form  which  can 
be  executed  by  the  host  system.  The  user  had  no  knowledge  of  this 
transformation;  It  Is  an  Implementation  detail. 

The  Syspal  program  object  Is  an  extension  of  the  Catoan-ob j ect.  This 
allows  the  programmer  to  use  the  properties  of  Catoan-ob j ects  when  thinking 
about  managing  his  programs.  For  example.  If  someone  has  written  a  utility 
program  which  produces  a  copy  of  a  Catoan-ob j ec t,  that  same  program  could 
probably  be  used  with  Syspal  program  objects  with  very  little,  tf  any, 
modification.  If  other  computing  systems  or  other  naming  environments  (see 
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Chapter  Five)  could  reference  hla  Catoan-objects,  then  they  could, 
likewise,  reference  his  Syspal  program  objects.  This  allows  the  Issues  of 
object  management  to  be  left  to  the  object  manager,  regardless  of  the  use 
to  which  the  objects  are  being  put,  regardless  of  the  extensions  triilch  are 
made  of  the  basic  Catoan-object. 

IV. D.  Summary. 

Many  people  do  little  with  computers  but  write  programs  on  and  for 
them.  Generally,  the  abstractions  available  for  their  use  for  actually 
writing  the  programs  are  very  primitive.  The  Syspal  program  object 
presented  above  is  a  high-level  abstraction  for  writing  and  storing 
programs  which  Is  based  on  the  Catoan-object  and  its  extensions. 
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As  mentioned  In  Chapter  One,  I  do  not  assume  that  Catoan  is  the  only 
manager  for  named,  permanent  objects  that  exists  In  the  system.  Therefore, 
Catoan' s  Is  not  the  only  naming  environment  in  the  computing  system.  If 
there  exist  other  naming  schemes,  and  another  naming  environment  is  created 
which  is  disjoint  from  the  one  I  propose,  what  are  the  implications?  Are 
the  name  spaces  forever  disjoint?  Is  there  a  way  to  refer  to  objects  in 
one  namespace  while  within  another?  Is  there  a  way  to  transfer  objects 
from  one  namespace  to  another,  either  from  within  either  of  the  two 
namespace  in  question,  or  from  a  third  one? 
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V. k.  Disjoint  Mamlng  Spaces. 

Given  the  existence  o£  more  than  one  object  manager.  It  Is  very 
probable  that  the  objects  of  one  system  cannot  be  handled  by  the  others. 

In  classical  file  systems,  Internal  storage  formats  may  differ,  the  system 
overhead  Information  stored  may  differ,  the  structure  of  the  files  may 
differ  —  In  fact,  the  "type"  (In  the  programming  language  senae  of  the 
word)  of  the  files  may  be  Incompatible,  so  that  the  different  kinds  of 
files  are  Implemented  by  different  modules. 

In  Catoan,  the  naming  mechanism  is  part  of  the  object  structure,  and  Is 
handled  by  the  object  management  mechanism.  Separating  names  from  objects 
Is  not  part  of  Catoan' s  underlying  philosophy.  Therefore,  regardless  of 
the  structures  of  other  object  managers,  regardless  of  the  naming 
mechanisms  of  other  object  managers.  If  an  object  Is  not  a  Catoan-ob ject. 

It  cannot  be  named  within  Catoan. 

If  Catoan-objects  can  be  named  and  accessed  directly  by  some  other 
object  manager,  the  naming  structures  are  not  disjoint.  In  this  case,  data 
transfer  Is  no  problem,  and  Is,  indeed,  a  moot  point:  the  objects  of  both 
systems  are  accessible  from  one  of  the  systems. 
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Let  us  assume  that,  not  only  can  Catoan  not  access  non-Catoan-objects, 
but  other  systems  likewise  cannot  access  Catoan-objects,  either.  In  this 
case,  the  naming  structures  are  truly  disjoint,  and  data  in  the  objects  of 
one  system  cannot  be  transferred  directly  into  objects  of  the  other.  What 
is  needed  for  such  data  transfer  is  some  procedure  which  can  bridge  the  two 
naming  structures. 

To  be  able  to  write  a  "bridging"  procedure,  it  must  be  possible  to 
access  both  object  managers  from  the  same  procedure.  This  requires  that 
the  Interface  for  both  systems  be  available  to  the  procedure.  The 
procedure  must  be  able  to  name  and  to  access  (in  a  protection  sense)  the 
interfaces;  if  naming  can  be  done  at  this  level  directly,  with  internal 
names  (segment  numbers,  capabilities),  then  providing  the  procedure  with 
the  Internal  unique  identifiers  of  the  two  object  managers  produces  the 
necessary  availability. 

If  naming  cannot  be  done  with  internal  names,  then  a  mechanism  is 
needed  to  allow  translation  of  external  names  (character  strings)  to 
internal  ones.  This  requires,  essentially,  another  name  manager  for 
"special"  interfaces  which  are  needed  between,  among,  and  above  the  normal 
naming  structures. 
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Once  the  Interfaces  (and  modules)  for  both  object  managers  are 
available  to  the  bridging  procedure,  transferring  data  between  the  two 
naming  environments  involves  obtaining  the  necessary  information  from  one 
environment  (using  the  operations  of  its  objects),  and  supplying  that 
Information  to  the  other  environment  (using  the  operations  on  its  objects). 
The  author  of  the  procedure  must,  therefore,  know  the  interfaces  for  both 
systems.  Such  bridges  might  be  provided  as  part  of  a  system-wide  library 
of  "utility"  routines. 

V.B.  A  Standard  Interface  for  Filing  Systems. 

An  alternative  to  forcing  someone  who  is  trying  to  transfer  data 
between  object  managers  into  learning  the  idiosyncrasies  of  both  systems  is 
to  have  all  object  managers  meet  the  same  Interface  (if  standard  data 
transfer  is  to  be  possible).  This  interface  would  specify  the  minimal  set 
of  operations  required  of  an  object  manager,  and  would  also  allow  data  to 
be  transferred  freely  among  object  management  systems  and  their  naming 
environments. 

Because  of  the  wide  variety  of  storage  techniques,  protection  schemes, 
and  information  collected,  access  to  the  "overhead"  information  will  not  be 
Included  in  the  "standard,  minimal  interface"  which  ui.ll  be  defined. 

Because  there  are  many  ways  to  Interpret  names,  many  ways  to  organize  a 
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naming  structure,  many  ways  to  attach  semantics  to  a  naming  structure  (be 
It  a  hierarchy,  a  network,  or  even  a  list) ,  passing  components  of  names  to 
the  object  manager  may  not  make  sense.  Because  there  are  many  ways  to 
structure  data,  a  limited  means  for  accessing  an  object  manager's  data  will 
be  provided.  Figure  11  shows  the  standard,  minimal  Interface  for  object 
managers. 


lookup:  PROCEDURE (name :strlng(*) ,  root:@TYPE(SELF) 

RETURNS ( ob  j : @TYPE (SELF)) 

EXCEPT ION (name_not_found( index:  INTEGER) , 
name_invalld( index:  INTEGER)) 

(*  Translates  a  character-string  name  Into  an  object 
reference,  relative  to  "root."  "Index"  is  the 
position  in  "name"  up  to  which  the  name  could  be 
found  or  parsed.  *) ; 

contents _read:  PROCEDURE 

RETURNS (cont :@contents_type) 

(*  Extracts  the  CONTENTS  from  the  object.  *) ; 

contents_set :  PROCEDURE (cont :@content 8 _type) 
EXCEPTION(contents_type_inappr opr late) 

(*  Places  "cont"  in  the  CONTENTS  of  the  object.  *) ; 


Figure  11:  Standard,  Minimal  Interface  for  a  Filing  System. 


Names  are  handled  in  their  entirety  only,  and  are  relative  to  some 
point  which  Is  supplied  by  the  caller.  This  "root"  pointer  may  be  NULL,  in 
which  case  the  object  manager  determines  the  root.  If  the  root  is  not 
NULL,  the  name  is  parsed  relative  to  the  supplied  root.  For  example,  if 
one  wanted  to  have  a  Mult  lea  file  system  parse  the  name 
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"~udd'‘CSa~Marcu»~theals,"  the  root  would  not  have  to  be  specif ted,  because 
Multlcs  has  one  global  root.  If  "Marcum'~the3ls"  were  to  be  located 
relative  to  " “udd^CSR" ,  then  "''udd'^CSR"  could  be  supplied  as  the  root. 

If  the  above  example  were  to  be  executed  In  Catoan,  and  the  object 
"thesis,"  a  sub-object  of  the  object  "Marcum,"  were  to  be  found,  a  pointer 
to  CSR  would  be  supplied  as  the  root,  and  "Marcum^thesls"  would  be  supplied 
as  the  name. 

Just  as  names  are  handled  In  their  entirety,  the  data  contained  In  an 
object  are  accessible  only  In  their  entirety.  One  Is  allowed  to  SET  and 
_READ  the  CONTENTS  of  some  object  as  a  whole.  Returned  by  _REA.D  Is  a 
pointer  to  the  CONTENTS,  which  may  be  of  arbitrary  type.  Just  as  the 
CONTENTS  of  a  Catoan  object  may  be  of  arbitrary  type.  _SET's  argument  Is  a 
pointer  to  a  datum  of  arbitrary  type  to  be  used  as  the  CONTENTS  of  the 
ob  j  ec  t . 

Some  object  managers  may  have  to  place  restrictions  on  the  types  of  the 
objects  which  are  the  CONTENTS  being  stored.  It  is  the  responsibility  of 
the  object  manager  to  verify  that  the  type  of  the  CONTENTS  Is  sensible  for 
that  particular  style  of  object  manager.  The  exception 
CONTMTS_TYPE_INAPPROPRIVTE  Is  provided  to  allow  a  standard  mechanists  for 
signalling  such  a  problem. 
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^  Catoan  does  not  meet,  as  described  so  far,  the  standard,  minimal 

interface.  The  operations  on  the  CONTENTS  (_SET  and  _READ)  are  compatible, 
but  an  additional  DIRECTORY  operation  Is  needed  to  take  a  full  name  and  a 
/-  root  point,  and  return  a  pointer  to  the  named  object.  This  is  a  simple 

addition,  with  which  Catoan  meets  the  standard,  minimal  interface  of 
Figure  11. 


V.C.  Garbage  Collection. 

Reclaiming  storage  used  by  objects  which  are  Inaccessible  may  be 
necessary.  If  such  "garbage  collection"  is  needed,  how  does  the  existence 
of  multiple  naming  environments  affect  garbage-collection? 

Garbage  collection  Is  a  reclamation  of  the  physical  storage  used  by 
logical  entitles  (objects)  which  become  inaccessible.  Garbage  collection 
techniques  have  been  a  topic  of  investigation  for  a  long  time;  they  still 
are.  I  shall  not  discuss  the  actual  techniques  here;  the  interested 
reader  is  referred  to  [3,  4,  30,  33].  Rather,  what  follows  is  a  discussion 
of  the  effects  of  multiple  name  spaces  on  garbage  collection. 

Usually,  garbage  collection  is  performed  by  the  object  manager.  If 
this  view  of  garbage  collection  is  taken,  all  works  well  while  there  is 
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only  one  object  manager.  Indeed,  all  may  work  well  within  each  of  the 
Individual  object  managers.  Each  object  manager  has  enough  Information  to 
garbage  collect  its  own  objects.  What  happens,  however,  if  there  exist 
inter-namespace  references?  What  happens  if  an  arbitrary  object  can  refer 
to  another  arbitrary  object,  as  can  happen  in  Catoan? 

A  possible  solution  is  to  extend  the  standard,  minimal  interface  for 
object  managers  (see  Figure  11)  to  include  operations  for  communicating 
garbage  collection  information.  Suppose  two  object  managers,  Catoan  and 
Namit,  exist  in  one  computing  system.  Let  "Cl,"  "C2,"  et  cetarae  be 
Catoan-obj ects;  let  "Nl,"  "N2,"  et  ceterae  be  Namlt-objects.  There  can  be 
references  in  Cl  to  C2,  for  example,  and  there  might  be  references 
permitted  between  two  Namlt-objects.  Objects  in  Catoan  can  certainly 
reference  objects  in  Namit;  whether  objects  in  Namit  can  reference 
Catoan' s  objects  is  immaterial. 

Perhaps  Cl  references  C2,  and  C2  references  N6.  Catoan  reaches  a  stage 
when  garbage  collection  is  required,  and  so  it  scans  its  objects  for 
inter-object  references.  It  records  the  CI-C2  reference.  Upon  discovering 
the  C2-N6  reference,  it  must  transmit  the  information  that  N6  is  referenced 
to  N6's  manager,  Namit.  How  might  this  be  done? 
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Let  us  assume  that  Catoan  can  determine  that  N6  belongs  to  Naralt  (I 
shall  return  to  this  Issue  shortly) .  Catoan  must  (conceptually)  send  a 
message  to  Naralt  Indicating  that  N6  is  referenced  from  some  other  naming 
environment.  Perhaps  Catoan  would  even  specify  that  N6  was  referenced  from 
the  Catoan  naming  environment,  by  object  C2.  How  would  Catoan  name  N6  to 
Namlt?  If  all  Inter-namespace  references  are  symbolic,  Catoan  could  use 
the  same  name  that  C2  used.  (This  also  solves  the  problem  of  determining 
the  object  manager  of  N6,  mentioned  above.)  If,  however,  references  are 
direct  (rather  than  symbolic) ,  as  they  could  be  In  Catoan,  It  would  be 
necessary  to  pass  to  Namlt  the  direct  reference  (which  might  be  a  segment 
number) .  This  presents  no  problem  If  garbage  collection  can  be  done 
without  object  names,  as  is  usually  the  case. 

Direct  references  pose  another  problem:  how  does  Catoan  determine  that 
Namlt  Is  the  manager  of  N6?  Perhaps  some  extra  Information  Is  stored  with 
the  reference  In  C2  to  N6  enabling  Catoan  (or  any  other  object  manager)  to 
determine  that  the  reference  is  to  an  object  of  some  other  object  manager. 
(Indeed,  some  such  information  is  needed  to  allow  an  object  manager  to 
determine  at  least  that  an  object  reference  is  to  one  of  Its  objects  or  to 
an  object  of  some  other  object  manager.)  Another  possible  solution  is  to 
maintain  a  directory  of  references  to  objects  of  other  object  managers. 
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Regardless  of  the  exact  methods  for  solving  the  various  problems  of 
inter-namespace  references,  garbage  collection  will  require  much 
inter-object  manager  communication  to  convey  the  inter-namespace 
references.  Furthermore,  additional  complexity  is  introduced  into  the 
standard,  miaimal  interface  for  object  managers  of  Figure  11,  into  the 
Information  stored  for  references,  into  the  mechanics  of  garbage 
collection.  [4]  contains  a  discussion  of  garbage  collection  in  multiple 
address  spaces  with  inter-address  space  references.  When  the  address 
spaces  are  logical  rather  than  physical,  when  they  are  name  spaces  rather 
than  address  spaces,  when  they  are  managed  by  more  than  one  entity,  garbage 
collection  is  even  more  difficult  than  as  described  in  [41. 

Another  solution,  which  I  prefer,  is  to  make  garbage  collection  the 
function  of  the  memory  management  system.  This  is  especially  appealing  in 
an  addressing  system  in  which  all  references  must  be  made  through  tagged 
"pointers."  Such  references  can  be  recognized  easily  by  the  memory  manager 
(because  they  are  tagged) .  Generally,  as  long  as  the  memory  manager  can 
determine  that  a  reference  to  an  area  of  storage  exists  somewhere,  the 
precise  form  of  addressing  is  Immaterial  --  it  can  be  through  segment 
numbers,  disc  addresses,  capabilities,  et  ceterae. 

If  the  memory  management  system  can  determine  that  an  area  of  memory  is 
referenced,  regardless  of  where  the  reference  is  located  within  the  memory 
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system,  it  can  do  the  garbage  collection.  The  memory  management  system  Is 
below  the  object  managers.  Furthermore,  because  the  memory  management 
system  is  part  of  the  operating  system  kernel,  all  object  managers  use  the 
same  (the  only)  memory  manager.  Therefore,  because  a  single  entity  ha3 
access  to  all  the  object  references,  and  can  determine  when  something  is 
and  is  not  an  object  reference,  the  problem  of  garbage  collection  in 
multiple  naming  environments  is  solved. 

V.  D.  Summary. 

Chapter  Five  has  presented  the  issues  surrounding  the  existence  of 
multiple  naming  environments  in  a  computing  system.  The  effects  of 
multiple  naming  environments  on  system-wide  naming,  on  transferring  data 
among  name  spaces,  and  on  garbage  collection  (storage  management)  were 
discussed.  A  "standard,  minimal"  filing  system  interface  was  described. 
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In  the  following,  I  look  at  my  proposals,  commenting  on  trtiat  they  are 
and  "where  I  am,"  on  their  completeness,  and  on  the  trade-offs  that  have 
been  or  could  be  made.  I  examine  them  with  regard  to  previous  work  and 
what  "might  be  done."  Lastly,  I  present  my  recommendations  for  further 
research  In  the  area  of  managing  tamed,  permanent  objects  In  computing 
systems  which  range  In  size  from  a  single-user  personal  computer  to  a 
distributed  network  composed  of  many  autonomous  hosts  (which  range  In  size 
from  personal  computers,  to  multiple-user  computing  utilities,  to  networks 
themselves) . 
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VI. A.  Summary. 

This  report  has  presented  the  results  of  an  Investigation  into  storing 
things  in  modern  computing  systems.  The  investigation  has  produced  a 
design  of  a  system  called  "Catoan,"  which  is  a  manager  for  named,  permanent 
objects.  Colloquially,  such  a  manger  could  be  considered  an 
object-oriented  filing  system. 

A  description  of  existing  ways  of  viewing  permanent  storage  was 
presented  in  Chapter  Two,  describing  Honeywell's  Multics  and 
Hewlett-Packard's  MPE/3000  in  depth.  Bell  Telephone  Laboratories'  Unix  was 
briefly  described,  as  was  Carnegie-Mellon  University's  Hydra.  The  file 
systems  in  each  of  these  influenced  my  thinking  about  permanently  storing 
objects  in  a  computing  system.  A  few  methods  for  maintaining  versions  of 
objects  were  also  described  in  Chapter  Two. 

In  Chapter  Three,  I  described  Catoan.  The  "Basic  Catoan-object"  was 
defined  and  described,  and  a  representation  of  the  information  in  the 
Catoan-object  was  presented.  Refinements  of  the  basic  object  were  shown, 
including  an  access  control  list  protection  scheme,  cross-referencing,  and 
version  maintenance.  A  general  scheme  for  storing  versions  was  described, 
which  allows  the  user  to  make  the  space-time  trade-offs  which  most  other 
version  maintenance  schemes  make  for  the  him. 
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An  example  of  using  Catoan  was  described  in  Chapter  Four.  A 
Syspal-program  object  was  built  using  the  cross-referenced  and  versioned 
Catoan-obj  ects. 

Chapter  Five  related  the  problems  which  occur  when  multiple  naming 
environments  exist  in  the  same  computing  system.  It  is  assumed  that  Catoan 
might  not  be  the  only  object  manager  in  the  computing  system,  and  that 
users  might  desire  to  transfer  information  among  object  managers  and  their 
naming  environments.  The  effects  of  multiple  naming  environments  on 
garbage  collection  were  also  stated. 

More  globally,  more  abstractly,  in  this  report  I  have  described  a  view 
of  storing  data  in  a  computing  system  which  departs  from  the  classical 
view.  I  have  made  this  departure  because  the  classical  views  of  data 
storage  are  not  amenable  to  many  of  the  current  philosophies  on 
programming,  software  engineering,  and  data  abstraction.  Catoan  allows  one 
to  think  of  data  storage  in  the  abstract;  it  allows  one  to  think  of 
storing  abstract  data  objects,  rather  than  storing  "piles  of  bits." 

Catoan  is  merely  a  type  manager,  for  a  Catoan-obj ect.  However,  it  is  a 
rather  odd  type  manager:  it  gives  out  references  to  portions  of  the 
representation  of  its  data  —  namely,  a  pointer  to  the  CONTENTS.  It  is 
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this  aspect  of  Catoan  which  makes  it  untrusted:  part  of  the  representation 
of  a  Catoan-obj ec t  is  not  secure. 

VI . B .  Completeness. 

Catoan  has  also  been  a  vehicle  for  exploration.  Very  rarely  is  the 
permanent  data  storage  mechanism  of  a  computing  system  not  trusted.  Very 
rarely  do  multiple  filing  systems  exist  within  the  same  computing  system. 
Yet,  these  are  two  important  issues  in  the  design  of  Catoan. 

When  one  is  exploring  and  experimenting,  there  is  a  good  chance  that 
Ch«  raaults  will  not  be  perfect.  So  it  is  with  Catoan.  The  decision  that 
Catoan  need  not  be  trusted,  and  will  not  be  trusted,  limits  its  use. 

Because  of  the  lack  of  trust,  Catoan  cannot  enforce  extended  controls  on 
access  to  the  data  of  a  Catoan-obj ect. 

If  one  were  to  trust  Catoan,  and  make  Catoan  the  only  object  manager, 
then  other  filing  systems  and  naming  environments  could  still  exist. 
However,  rather  than  building  directly  on  the  memory  management  facilities, 
the  other  filing  systems  would  build  on  Catoan.  although  this  does  solve 
the  trust  issue,  it  introduces  inefficiency  by  Imposing  another  layer  of 
mechanism  between  the  user  and  permanent  storage.  It  may  limit  flexibility 
if,  in  fact,  a  particular  application  is  ill  suited  to  Catoan  (a 
possibility  if  for  no  other  reason  than  Catoan  is  not  Implemented). 
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The  naming  scheme  of  Catoan  allows  a  network  of  Catoan-obj ects  to  be 
built.  This  Introduces  additional  complexity  by  making  It  more  difficult 
to  traverse  the  naming  environment.  When  writing  a  program  to  traverse  a 
tree.  It  Is  known  that  there  will  be  no  loops  encountered  during  the 
traversal.  But,  when  traversing  a  network.  It  Is  possible  to  encounter  a 
loop;  therefore,  loop  detection  Is  needed.  However,  the  additional 
flexibility  gained  by  allowing  multiple  parents  and,  therefore,  a  naming 
network  often  outweigh  the  cost  of  additional  traversing  complexity. 
Furthermore,  because  a  network  Is  a  superset  of  a  hierarchy,  a  naming 
hierarchy  can  be  used,  foregoing  the  generality  (and  cost)  of  a  network. 


Catoan  hae  no  concept,  analogous  to  the  soft  link,  of  associating  an 
external  name  with  another  external  name.  Catoan  recognizes  only  hard 
links,  and  multiple  parents  of  an  object.  There  are  semantics  of  soft 
links  which  cannot  be  modeled  using  hard  links.  For  example,  allowing  a 
user  to  use  the  same  (local)  name  for  some  object,  regardless  of  the 
modifications  made  to  the  object.  Is  much  easier  using  soft  links.  If  It 
Is  possible  at  all  with  hard  links  (and  this  depends  on  the  type  of 
Internal  name  to  which  a  hard  link  translates  an  external  name) , 
substitution  Is  usually  much  more  visible  to  the  unconcerned  user  than  with 
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soft  links.  Nonetheless,  because  changing  the  CONTENTS  of  a  Catoan-obj ect 
does  not  affect  the  containing  objects,  the  "soft  substitution"  provided  by 
soft  links  Is  easier  to  approach  with  Catoan  hard  links  than  with,  for 
example,  Unix  hard  links. 

The  Catoan  philosophy  would  dictate  that,  because  of  uniformity,  each 
object  should  contain  a  section  for  soft  links.  If  they  were  to  be  Included 
in  Catoan.  An  alternative  Is  to  introduce  a  new  type  of  Catoan-obj ect,  a 
"soft_link."  This  points  out  another  feature  of  Catoan:  there  Is  only  one 
type  of  Catoan-obj ect.  This  forces  the  overhead  of  both  portions  on  all 
the  users  of  Catoan,  even  if  eighty-seven  percent  of  their  objects  do  not 
use  the  CONTENTS. 

One  of  the  most  important  questions  to  be  answered  about  Catoan  Is: 

"Can  one  do  everything  with  Catoan  that  one  can  do  with  'conventional'  file 
systems?"  I  claim  that,  except  for  Issues  of  trust,  one  can,  and  that,  in 
fact,  one  can  do  some  things  In  Catoan  that  cannot  be  done  In  many  existing 
file  systems.  As  to  trust,  the  overhead  operations  are  most  greatly 
Impacted  by  not  trusting  Catoan  —  the  SYSTEM  OVERHEAD  INFORMATION  Is  not 
necessarily  correct. 

The  data-oriented  operations  In  Catoan  are  the  "CONTENTS"  operations, 
described  In  Section  IIl.B.l.b.  The  operations  are  very  simple,  and  from 
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their  aimplicity  comes  much  generality.  AIbo,  because  of  the  lack  of 
constraints  on  the  structure  of  the  CONTENTS,  anything  which  can  be 
described  in  Syspal  can  be  stored  directly  in  a  Catoan-ob j ect.  (It  can  be 
argued  that  Syspal' 8  data  description  facilities  are  universal;  such 
arguments  are  outside  the  coverage  of  this  report.) 

Because  Catoan  allows  an  arbitrary  network  of  objects  In  its  naming 
structure,  relationships  which  cannot  be  expressed  in  some  other  systems 
(for  example,  hierarchical  naming  environments)  can  be  easily  expressed  in 
Catoan.  Objects  can  be  composed  of  sub-objects,  which  may  themselves  be 
composed  of  further  sub-objects,  any  of  which  (at  any  level)  may  be  part  of 
other  objects. 

In  the  basic  Catoan-object,  there  is  no  provision  for  enforcing 
protection  (except  at  the  CONTENTS's  type  level,  which  is  somewhat  clumsy). 
Protection  is,  however.  Introduced  as  a  refinement.  This  refinement  is 
merely  a  suggestion,  and  is  presented  as  such  to  re-enforce  its 
optlonallty.  For  similar  reasons,  cross-references  and  version  maintenance 
schemes  are  extensions  and  refinements,  and  are  not  critical  to  the  basic 
theory. 

No  mechanisms  for  concurrency  control  have  been  presented  in  this 
report.  This  is  because  there  are  very  many  schemes,  ranging  from 
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"classical"  locks,  to  monitors  [14],  to  semaphores,  to  event  counts  [25], 
to  some  very  recent,  perhaps  esoteric  schemes  aimed  primarily  at 
distributed  systems  [24],  If  one  desired  to  implement  concurrency  control 
atop  the  basic  Catoan-obj ect,  or  any  of  Its  refinements,  this  could  be 
done,  and  should  not  Impact  the  abstractions  which  exist. 

When  designing  a  computing  system,  recovery  from  semi-catastrophic 
failures  and  from  human  errors  is  often  considered.  The  concept  of 
off-line  backup  of  on-line  storage  Is  crucial  to  a  system  which  portends  to 
be  a  safe  repository  for  Its  users'  data  [31].  However,  backup  Is  not 
discussed  In  this  report.  To  make  Catoan  complete,  some  form  of  off-line 
backup  must  be  Included,  at  some  level.  This  was  not  done  here  because  of 
the  Implications  that  lack  of  trust  has  on  the  ability  to  access  data  so  as 
to  transfer  It  to  off-line  backup.  If  Catoan  is,  in  fact,  not  trusted,  the 
task  of  backup  must  be  relegated  to  the  memory  manager,  which  is  trusted, 
or  to  a  higher  level  abstraction  which  is  in  a  better  position  to  Implement 
backup  when  it  is  needed. 

VI. C.  Trade-Offs. 

An  Implicit  trade-off  has  been  responsibility  for  memory  management. 
Most  filing  systems  perform  their  own  buffering  between  primary  and 
secondary  memory;  Catoan  relies  on  the  underlying  memory  management  system 
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for  this.  While  this  certainly  simplifies  Catoan,  and  helps  support  the 
multiple- level,  abstract  system  concept  [23,  36,  2],  there  may  be  a 
sacrifice  in  control  over  buffer  management,  resulting  in  a  decrease  in 
system  performance. 

In  a  memory  system  which  is  "automatically"  managed,  the  performance 
degradation  will  generally  be  local,  visible  only  to  the  user  of  Catoan 
whose  application  would  benefit  from  detailed  control  over  the  buffer 
management.  However,  such  local  control  will  often  result  In  degraded 
global  performance,  because  the  memory  (buffer)  manager,  which  has  more 
global  information  than  the  filing  system,  is  being  circumvented. 

An  instance  of  the  "classical  space-time  trade-off"  can  be  found  in 
version  maintenance.  One  has  the  option  of  very  fast  access  to  any  version 
(at  the  expense  of  storing  each  version  in  its  entirety),  or  of  very  little 
storage  (at  the  expense  of  building  the  requested  version  from  a  "base"  by 
applying  "updates").  This  trade-off  has  been  left  to  the  user  of  Catoan' s 
version  maintenance  system,  by  allowing  him  to  specify  a  "base,"  a  sat  of 
"updates,"  and  a  program  to  apply  the  updates  to  the  base.  See 
Section  IIl.D  for  further  details. 


The  view  of  stored  objects  presented  by  Catoan  is  very  unlike  that 
presented  by  most  existing  object  managers  (filing  systems).  Usually, 
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stored  objects  are  viewed  as  a  one-dimensional  array  of  records  (byte 
strings).  This  view  allows  the  object  to  be  access  in  pieces,  rather  than 
requiring  that  it  be  accessed  in  its  entirety  (as  far  as  the  object  manager 
is  concerned) .  This  decision  allows  objects  to  be  viewed  abstractly,  and 
to  have  an  internal  structure  which  is  unknown  to  Catoan.  If  a  more 
classical  view  is  desired  (because,  for  example,  most  of  the  object  are 
very  large,  and  one  generally  wants  to  access  only  a  small  portion  of  them, 
anyway) ,  a  record-at-a-time  view  could  be  built  atop  Catoan,  using  Catoan 
to  actually  store  the  object.  Because  Catoan' s  CONTENTS_READ  operation 
returns  a  pointer  to  the  contents,  rather  than  the  entire  contents  itself, 
such  a  system  would  not  require  modification  to  Catoan,  nor  would  it 
generate  excessive  memory  referencing  from  reading  in  the  entire  contents. 

What  happens  if  some  portion  of  memory  is  volatile?  How  must  Catoan  be 
changed  so  that  a  user  can  be  assured  that  his  data  is  in  stable  storage? 
Catoan  must  provide  the  user  with  a  MAKE_NON_VOLATILE  operation  which 
performs  a  "synchronous  write"  so  that,  upon  termination  of  the  operation, 
the  user  is  assured  that  the  object  has  been  transferred  to  non-volatile 
storage.  This  requires  a  similar  operation  exist  for  the  memory  manager, 
since  the  view  it  presents  to  Catoan  is  that  of  non-volatile  storage. 

A  very  Important  trade-off  is  that  of  trust.  Because  Catoan  need  not 
be  trusted,  the  information  in  the  DATEs  and  PRINCIPALS  fields  may  be 
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Inaccurate.  Lack  of  trust  implies  a  certain  difficulty  In  enforcing 
security  and  In  Implementing  backup,  and  implies  certain  uncontrolled 
accessibility  to  Catoan-ob  j  ec  ts  (in  particular,  to  the  CONTENTS).  But, 
Catoan  is  optional.  If  Catoan  provides  protection  mechanisms,  if  Catoan  is 
secure,  then  it  must  be  trusted,  and  it  probably  becomes  mandatory. 

VI. D.  Remaining  Work. 

Much  has  been  done  on  and  with  Catoan.  Much  is  left  to  do:  more 
theory  needs  developing,  practical  experience  needs  to  be  gained  with  the 
concepts  embodied  in  Catoan.  This  section  describes  some  of  the  work  which 
remains  to  be  done  relating  to  Catoan  and  the  ideas  presented  in  this 
report. 


As  mentioned  in  Chapter  One,  Catoan  n.ight  be  used  on  a  machine  which  is 
part  of  a  multi-node  network.  In  such  an  environment,  one  often  wants  to 
name  resources  which  exist  at  remote  nodes.  Furthermore,  one  often  wants 
to  locate  a  resource  thought  to  exist  somewhere  in  the  network,  but  at  an 
unknown  node.  Despite  the  need  for  investigation  into  this  area,  this 
report  on  Catoan  does  not  address  network-wide  filing  systems  or  naming 
environments.  One  possible  view  of  a  network-wide  filing  system  built 
using  Catoan  is  to  consider  the  remote  nodes  as  representing  other  members 
of  a  collection  of  multiple  naming  environments.  It  might  then  be  possible 
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to  apply  the  concepts  presented  in  Chapter  Five  to  the  problems  of 
network-wide  filing  systems. 

Issues  of  protection,  security,  and  sharing  are  relevant  to  the  goals 
of  Catoan.  These  have  been  discussed  briefly  throughout  this  report; 
additional  work  is  needed  to  present  a  unified  view  of  protection  and 
sharing  to  the  users  of  Catoan  that  is  both  convenient  and  powerful. 

\s  discussed  in  Section  III.C.2,  when  implementing  cross-references 
there  is  a  problem  of  who  pays  for  the  storage  occupied  by  the 
cross-reference  information.  This  is  part  of  a  more  global  problem  of  how 
to  determine  the  amount  of  storage  in  one  principal's  space  which  is 
occupied  by  the  data  of  another  principal  (including  "The  System").  I  know 
of  no  previous  work  done  in  this  area. 

Designing  a  system  which  is  robust  in  the  face  of  host-system  failures 
is  still  a  large  open  research  question.  Because  Catoan  manages  permanent 
data  objects,  it  should  provide  stability  in  the  face  of  failure. 

Lastly,  how  might  one  implement  Catoan?  How  difficult  would  it  be?  Is 
the  environment  Catoan  presents  to  its  users  really  the  right  one?  Is 
Catoan  complete,  sufficient,  and  easy  to  use?  Only  an  attempted 
implementation  can  answer  these  questions. 
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This  appendix  summarizes  Che  salient  features  of  Syspal  (l)  [10]  as 
they  relate  to  this  presentation.  The  reader  is  warned  that  this  is  not  a 
definitive  explanation  of  the  language,  nor  is  it  complete.  The  reader  Is 
warned  further  that  this  represents  Syspal  as  I  knew  it  in  May,  1979,  while 
the  language  was  still  undergoing  active  development.  The  language  as  it 
actually  is  defined  at  the  time  this  paper  is  read,  or  even  published,  may 
differ  substantially  fcom  the  summary  presented  here. 


Syspal  is  a  dac  i-abstraction  language,  based  on  Pascal,  and  geared 
toward  systems  programming.  Much  of  the  syntax  and  semantics  are  derived 
from  Pascal,  and  from  CLU.  One  of  the  design  goals  of  Syspal  is  to  support 
modular  programming  conveniently. 


(1)  Syspal  is  an  experimental  programming  language  under  development  at 
Hewlett-Packard  Laboratories,  Electronics  Research  Center,  Computer 
Research  Laboratory,  in  Palo  Alto,  California. 
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Syspal  provides  the  programmer  with  a  few  standard,  "built  In"  data 
types.  Various  forms  of  enumeration  types,  which  specify  the  range  of 
values  of  a  type,  are  available.  Using  enumerations,  the  usual  INTEGER, 
REAL,  BOOL,  and  CHAR  types  can  be  defined.  For  example,  INTEGER  might  be 
defined 

INTEGER  -  TYPE  -1000000  TO  1000000 

If  INTEGERS  between  positive  and  negative  one  million  were  desired.  The 
REAL  type  might  be 

REAL  -  TYPE  PRECISION  6  EXPONENT  32 

stating  that  six  digits  of  precision  and  an  exponent  between  positive  and 
negative  thirty-two  was  available.  BOOL,  representing  truth  and  falsehood, 
could  be  defined 

BOOL  -  TYPE  UNORDERED (TRUE,  FALSE) 

v*iere  UNORDERED  specifies  that  the  relations  based  on  order  (less,  greater, 
et  ceterae)  are  not  defined  on  BOOLs  (though  equal  and  not  equal  still 
are).  The  CHAR  type  represents  the  ASCII  character  set,  and  Is  an  ORDERED 
collection  of  the  values  according  to  the  ASCII  collating  sequence. 

In  addition  to  the  scalar  types,  aggregates  are  provided  by  Syspal. 

Two  kinds  of  aggregates  exist:  RECORDS  and  ARRAYS.  ARRAYS  are  homogeneous 
collections  of  elements  which  can  be  referenced  using  numeric  subscripts. 
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A  definition  like 

x:  ARRAY ( t  TO  6)  OP  INTEGER 

defines  "x"  to  be  a  six  element  ARRAY  of  INTEGERS.  The  declaration 

y:  ARRAY (*)  OF  CIRCULARS,  l,  2) 

specifies  "y"  as  an  array  with  unknown  size  of  modulo-three  integers. 


RECORDS  allow  non-homogeneous  data  to  be  lhcluded  in  the  same 
aggregate.  The  elements  of  RECORDS  are  accessed  by  their  field  names.  For 
example,  suppose  the  following  definition  were  part  of  a  Syspal  program: 


employee:  RECORD 

name:  string(30); 
addr :  RECORD 

street:  string(35); 
city_state:  string(35); 
zip_code:  0  TO  99999; 

END;  1  address 
salary:  L0000  TO  500000; 

monthly_productivity:  ARRAY(l  TO  12)  OF  0  TO  10; 
END;  I  employee 


This  defines  the  variable  "employee"  to  contain  four  fields:  "name"  (a 
character-string  of  length  thirty;  see  Section  I.C  for  a  definition  of 
strings);  "addr"  (which  itself  is  a  RECORD,  consisting  of  two  thirty-five 
character  strings  and  a  non-negative  Integer  less  than  100,000);  "salary" 
(an  Integer  between  10,000  and  500,000),  and  "monthly_productivity"  (which 
is  another  aggregate:  an  ARRAY  containing  twelve  elements,  each  of  which 
Is  an  integer  between  zero  and  ten) . 
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In  addition  to  being  able  to  define  variables,  the  Syspal  programmer  is 
allowed  to  define  new  types.  This  is  done  in  the  same  way  that  INTEGER, 
REAL,  et  ceterae  were  defined  above.  For  example, 

address  -  TYPE  RECORD 
street:  string(35); 
city_state:  string(35); 
zip_code:  0  TO  99999; 

END;  !addres3 

defines  a  type  called  "address,"  with  the  same  structure  as  the  "addr" 
field  in  the  "employee"  structure  above  (also  called  "employee. addr") .  A 
programmer-defined  type  (call  it  "PDTP")  can  be  an  extension  of  some  other 
type  (the  "base  type,"  call  it  BTP) . ,  meaning  t^at  PDTP  is  built  on  BTP  and 
"extends"  it.  Unless  specifically  prohibited,  an  extension  of  a  type  will 
match  the  base  type  for  the  purpose  of  compile-time  type  checking. 

Defined  types  can  have  user-specified  parameters,  as  shorn  in  the 
definition  of  the  "string"  type  found  in  Section  I.C.  Parameters  are  very 
useful  when  defining  modules,  such  as  a  stack  consisting  of  INTEGERS,  or  of 
REALs;  see  below  for  a  discussion  of  modules. 


-123- 


Append lx  A 


Cetoan 


Syapal 


One  can  also  define  a  variable  or  type  as  the  UNION  of  two  or  more 
types.  This  specifies  that  any  of  the  base  types  night  be  the  type  of  the 
defined  variable. 

Syspal  provides  poiaters.  Pointers  are  typed,  and  can  refer  to  only 
one  kind  of  object  (as  opposed  to  PL/l  pointers,  which  can  reference 
anything) .  A  pointer  to  an  INTEGER  is  declared 

pint:  ^INTEGER; 

and  a  pointer  to  an  address  would  be 

paddr:  ^address; 

If  the  value  of  "pint"  were  assigned  to  "paddr,"  an  error  would  be  raised. 

Control  Structures. 

Most  of  the  "usual"  flow  control  constructs  exist  in  Syspal. 
Conditionals  ( IF-THEN-ELSE  and  CASE),  iteration  (WHILE,  REPEAT,  FOR,  and 
LOOP  [infinite  repetition!),  exception  handling  (EXCEPTION),  and  procedure 
calling  (CALL),  among  others,  are  provided.  In  addition,  iteration  can  be 
controlled  by  a  "sequencer"  (1),  which  is  a  co-routine  to  provide  the  next 
value  for  iteration. 


(1)  This  is  very  similar  to  the  CLU  "iterator"  [22]. 
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Procedure  and  Function  Definition  and  Calling. 

Procedure  declarations  have  the  form: 

name:  PROCEDURE (parml : typelp,  parm2: type2p,  . . . ) 

RETURNS (var l_:  typelv,  var2 :  type2v,  . . . ) 

EXCEPT IONfcondl ( exvaraf) ,  cond2( exvar s2) ,  . . . ) ; 

This  defines  a  PROCEDURE  called  name.  The  parameters  are  parmN  (N  being  l, 
2,  et  ceterae) ,  of  types  typeNp.  The  procedure  returns  values  of  types 
typeNv  through  the  internal  names  varN.  Exceptional  conditions  condN  can 
be  raised  in  this  procedure;  they  will  return  with  parameters  exvarsN, 
respectively.  The  parameters,  RETURNS  clause,  EXCEPTION  clause,  and  vars 
portion  of  the  exceptional  conditions  ( "condN")  are  optional. 

As  mentioned  in  Section  I.C,  Syspal  recognizes  the  type  of  the  implicit 
operand  to  module  operations,  and,  furthermore,  assigns  this  implicit 
operand  to  the  keyword  "SELF."  Type  checking  is  performed  for  calling 
sequences,  as  well  as  for  other  variable  references. 

In  addition  to  a  normal  procedure  termination,  an  abnormal  termination 
can  occur.  There  is  only  one  way  for  a  procedure  or  function  to  terminate 
normally:  assign  a  value  to  the  RETURNS  variable  defined  in  the  function 
header  (if  any  exist),  and  exit  through  the  end  of  the  procedure  or 
function.  An  abnormal  termination  is  indicated  by  the  RETURN  statement. 
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Abnormal  termination  can,  in  addition  to  returning  the  name  of  the 
exceptional  condition,  return  values  which  can  be  used  by  the  calling 
procedure  to  diagnose  the  error. 

Modularity,  Data  Abstractions,  and  Interfaces. 

Syspal  is  a  data-abstr action  language,  similar  to  CLU  [22],  for 
example.  The  Syspal  analogue  to  the  CLU  cluster  is  a  "module."  When  one 
defines  an  abstract  data  type,  one  does  so  by  defining  the  module  which 
will  manage  the  abstraction.  Variables  of  the  abstract  type  are  then 
declared  to  be  of  the  module's  type. 

The  abstraction  is  defined  by  the  "interface"  of  the  module.  The 
interface  defines  those  things  (operations,  constants,  type  declarations, 
et  ceterae)  which  are  to  be  visible  to  users  of  the  abstraction;  all  other 
information  about  the  module  is  invisible  to  all  but  the  module  itself.  A 
module  can  have  many  interfaces;  for  example,  the  creator  of  an  object 
might  be  able  to  modify  the  object,  but  he  might  not  want  others  to  be  able 
to  modify  it,  only  to  read  it.  Figure  12  shows  the  definition  for  a  module 
implementing  a  STACK  abstraction.  The  module  definition,  including  the 
operations  and  representation,  and  three  interfaces  are  presented. 
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MODULE  stack(element_type:  TYPE,  stack_lim:  INTEGER): 

stack,  str  ict_stack,  loose  stack,  pseudo  stack; 
new:  PROCEDURE 

RETURNS(stk:  Qstack) 

(*  Creates  a  new  STACK,  of  "type"  ELEMENT  _TYPE,  with 
STACK  LIM  elements  (maximum).  *) ; 

ALLOCATE  SELF; 

SELF. to s  : ■  0; 
stk  :-  EXT (SELF); 

END  PROCEDURE;  !new 

push:  PROCEDURE (val :  element_type) 

EXCEPT ION (stack_over flow) 

(*  Puts  VAL  onto  the  top  of  the  stack.  *) ; 

IF  SELF. tos”Stack_lim  THEN 
RETURN(stack_over f low)  ; 

ELSE  SELF . tos  +1; 

SELF. elements(SELF. tos)  :»  val; 

END; 

END  PROCEDURE;  l push 

pop:  PROCEDURE 

RETURNS (top:  element_type) 

EXCEPT  ION ( stack_under  f low) 

(*  Return  and  discard  the  top  of  the  stack.  *) ; 

IF  SELF. tos=Q  THEN 

RETURN(stack_under f low) ; 

ELSE  top  :»  SELF. eleraents( SELF. tos) ; 

SELF. tos  :»#  -l; 

END; 

END  PROCEDURE;  !pop 

is_erapty:  PROCEDURE 
~  RETURNS ( an s:  BOOL) 

(*  Returns  TRUE  if  the  stack  has  no  elements.  *) ; 
ans  :«  SELF.tos-0; 

END  PROCEDURE;  l is_empty 


Syspal 
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make  empty:  PROCEDURE 

Forces  the  stack  to  have  no  elements.  *); 
l  :  INTEGER; 

operation_not_def ined_on_type:  EXCEPTION; 

l  l; 

EXCEPTION 

ON  operation_not_def ined_on_type  DO 
1  8tack_linrt-l; 

BEGIN 

WHILE  i<-stack_lim  DO 

SELF. elemental i)  :»  NULL(element_type) ; 
i  :-#  +1; 

END; 

END; 

SELF. to s  :«  0; 

END  PROCEDURE;  lmake_empty 

extract:  PROCEDURE (index:  INTEGER) 

RETURNS (elem:  element_type) 

EXCEPTI0N(stackjnonexistent_element(8ize:  I  TO  stack_Ilm)) 
(*  Returns  the  INDEXth- from- top  element  (top  -  l).  *) ; 

IF  index>SELF. tos  THEN 

RETURN(stackjaonexistent_element(SELF. tos)) ; 

ELSE  elem  :m  SELF. eleraents(SELF. tos-( index-1 )) ; 

END  PROCEDURE;  1  extract 

insert:  PR0CSDURB(val:  eleraent_type,  index:  INTEGER) 

EXCEPTION(stack_noaaxistent_element(size:  1  TO  stack_lim)) 
(*  Sets  the  iNDEXth-from-top  element  to  VAL  (top  •  l).  *); 
IF  index>SELF. tos  THEN 

RETURN(stack_nonex'istent_element(SELF.  tos) ; 

ELSE  SELF. elements( SELF. tos- (index-l))  :■  val; 

END  PROCEDURE;  linaert 


SELF:  RECORD 

tos:  0  TO  stack_lim; 

elements:  ARRAYfl  TO  stack^lim)  OF  element_type; 
END;  1SELF 


END  MODULE;  (stack 
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{Interface  definitions. 

(stack(element_type:  TYPE,  9tack_lim:  INTEGER), 
at rict_stack( element  _type:  TYPE,  atack  lim:  INTEGER)):  INTERFACE; 
new"  PROCEDURE 

RETURNS(atk:  @stack); 
puah:  PROCEDURE (val:  element_type) 

EXCEPTION(stack_over flow) ; 
pop:  PROCEDURE 

RETURNS (top:  element  type) 

EXCEPTION (stack_under flow) ; 
is  empty:  PROCEDURE 

RETURNS (ana:  BOOL); 

stack jjver flow,  stack_underf low:  EXCEPTION; 

END  INTERFACE;  {stack,  strict  stack 


loose_stack(element_type:  TYPE,  atack_lim:  INTEGER):  INTERFACE; 

ZVIS IBLY_EXTENDS  strict__stack(element_type,  stack_lim); 
make_erapty:  PROCEDURE; 
extract:  PROCEDURE (index:  INTEGER) 

RETURNS (elem:  element_type) 

EXCEPTION(stack_nonexfstent_element(size:  1  TO  stack_lim>); 
stack  nonexistent  element( size:  1  TO  stack  lim):  EXCEPTION; 

END  INTERFACE;  {loose  stack 


pseudo  atack(element  type:  TYPE,  atack  lim:  INTEGER):  INTERFACE; 
ZVISIBLY_EXTENDS  loose_stack; 

insert:  PROCEDURE(val :  element_type,  index:  INTEGER) 

EXCEPTI0N(stack_nonexi9tent_element( size:  1  TO  stack_lim)); 
END  INTERFACE;  ! pseudo _stack 


Figure  12:  A  Module  Implementing  a  Stack. 
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The  STACK  module  has  two  parameters:  defining  the  type  of  the  STACK'S 
elements  ("element_type")  and  its  maximum  site  ("stack_llm") .  These 
parameters  are  passed  to  STACK  when  a  new  STACK  is  created.  They  are 
supplied  oy  the  programmer  when  the  particular  STACK  variable  is  declared. 
For  example, 

inventory:  stack; 

Inventory  :•  HEW  stack(inven_control_record,  150); 

declares  "inventory"  to  be  a  STACK,  and  instantiates  it  as  a  stack  of 
"iavenjcontrol ^records,"  with  at  moat  one  hundred  fifty 
inven_control_records.  The  list  of  names  after  the  last  colon  in  the 
MODULE  statement  is  a  list  of  the  interfaces  which  this  module  meets. 

The  NEW  operation,  invoked  by  the  NEW  statement,  initializes  the  fields 
in  the  representation  of  the  STACK,  and  returns  the  external  (abstract) 
representation  of  a  stack  ( "EXT (SELF) ") . 

PUSH  and  POP  present  no  particular  surprises.  They  do  illustrate, 
however,  the  exception-handling  mechanisms  of  Syspal.  The  only  way  to 
terminate  the  execution  of  a  procedure  normally  is  to  exit  through  the  last 
statement  of  the  procedure  body,  having  previously  assigned  to  the 
appropriate  variables  irtiatever  values  are  to  be  returned.  If  an 
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exceptional  return  Is  to  be  performed,  the  RETURN  statement  Is  used,  naming 
the  exception,  and  specifying  the  parameters  which  might  be  returned  with 
the  exception  (see  EXTRACT  and  INSERT). 

The  ISJJMPTY  operation  Is  a  predicate  to  allow  the  user  to  see  If  the 
stack  has  any  elements.  MAKEJJMPTY  altera  the  stack  to  ensure  that.  If 
IS_EMPTY  were  called  Immediately  after  make_empty,  IS_EMPTY  would  return 
TRUE. 


EXTRACT  and  INSERT  allow  direct  access  to  the  elements  of  the  stack. 

If  an  undefined  element  Is  accessed,  the  exception 

STACK_NONEXI STENT  _ELEMENT  Is  signalled,  and  the  current  size  of  the  stack 
Is  returned  with  the  exception  name. 

The  interfaces  allow  various  forms  of  access  to  the  STACK  abstraction 
(module).  If  a  strict  stack  discipline  Is  desired  (access  to  only  the  top 
of  the  stack),  the  "stack"  or  "strlct_stack"  interface  would  be  used.  If  a 
slightly  looser  stack  discipline  is  desired,  allowing  writing  only  through 
PUSH  but  reading  anywhere  In  the  stack,  ’'loose_stack"  would  be  used.  If  no 
controls  over  the  use  of  the  stack,  but  the  convenience  of  a  stack,  were 
desired,  the  "pseudo_stack"  Interface  would  be  appropriate. 
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Note  that  the  "loose_stack"  and  "pseudo  jstack"  Interfacea  are  built  on 
other  Interfaces.  The  "ZVISIBLYJ2XTENDS"  statement  specifies  that  the 
named  Interface  should  be  considered  as  part  of  this  Interface,  and  that 
this  Interface  extends  It.  It  further  specifies  that  aLl  information  in 
the  extended  Interface  should  be  explicitly  visible  to  the  user.  (In 
contrast,  XEXTENDS  would  allow  the  extending  Interface  access  to  the 
operations  of  the  extended  Interface,  but  would  not  allow  the  user  access 
to  the  Information  In  the  extended  interface  unless  it  was  explicitly 
given.) 
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